You are viewing a plain text version of this content. The canonical link for it is here.
Posted to api@directory.apache.org by Alex Karasulu <ak...@apache.org> on 2011/01/06 02:36:31 UTC

[DISCUSSION] General API & SPI Concerns

Hi all,

Excuse the cross post but this also has significance to the API list.

Problem
------------

For our benefit and the benefit of our users we need to be uber careful with
changes after a major GA release. We have another thread where it seems
people agree with the Eclipse scheme of versioning and this sounds really
flexible for our needs. We can do a 2.0.0-M1 release at any time without
clamping down on API's. Only when we do a RC do we have to freeze changes to
interfaces.

The debate still remains as to what constitues an interface. Emmanuel seems
to disagree with configuration, schema, and partition db formats as being
interfaces of concern but for the time being we can just discuss those we do
agree on. There's no doubt about APIs and SPIs.


Solution
------------

So how do we make this as painless to us and users as much as is possible?
The best way is to keep the surface area of the SPI or API small, create
solid boundaries, and avoid exposing implementation details and
implementation classes.

By reducing the surface area with implementation hiding we can effectively
limit exposure and reduce the probability of needing to make a change that
breaks with our user contract. You might be asking what's a real world
example of this for us in shared?

And incidentally this is one of the things I've been working on in my
branch.


Real World Example in Shared
--------------------------------------------

Let's take the o.a.d.s.ldap.message package as an example. This package
contains classes and interfaces modeling LDAP requests and responses: i.e.
AddRequest, DeleteResponse etc. It's in the shared-ldap module.

In this package, in addition to request response interfaces, we're exposing
implementation classes for them. The implementation classes, in turn have
dependencies on o.a.d.s.ldap.codec.* packages. This is because some
implementation classes depend on codec functionality which is an
implementation detail. This might be due to eager reuse or the addition of
utility methods into codec classes for convenience. Some of these
dependencies can be removed by breaking out non-implementation specific
methods and constants in codec classes into utility methods outside of the
package or the module all together. Furthermore the codec implementation
that handles [de]marshaling has to access package friendly (non-API) methods
on implementation classes while encoding.

In the end, dependency upon further transitive dependencies are making us
expose almost all implementation classes in shared, and most can easily be
decoupled and hidden. It's effectively making everything in shared come
together in one big heap exposing way more than we want to.


LDAP Client API
------------------------

Everyone agrees that this API is very important to get right with a 1.0.
Right now this API pulls in several public interfaces directly from shared.
Those interfaces also pull in some implementation classes. The logical API
extends into shared this way. Effectively the majority of shared is exposed
by the client API. The client API does not end at it's jar boundary.

All this exposure increases the chances of API change when all
implementation details are wide open and part of the client API.  And this
is what I'm trying to limit. There are ways we can decouple these
dependencies very nicely with a mixed bag of refactoring techniques while
breaking up shared-ldap into lesser more coherent modules. The idea is to
expose the bare minimum of only what we need to expose. Yes the shared code
has become very stable over time but the most stability is in the interfaces
and if we only expose these instead of implementation classes then we'll
have an awesome API that may remain 1.X for a while and not require
deprecations as new functionality is introduced.


Finishing Up the Example
-------------------------------------

So what concrete things can we do?

The biggest step is to hide as many of the implementation classes as
possible. In my experimental branch I started by:

    (1) Moving out methods and constants in codec classes causing
unnecessary dependencies from message package classes and interfaces. There
was a situation even where StringTools for example depended on codec
classes, and virtually everything doing string related operations used
StringTools there by causing man interdependencies. It then becomes a web of
dependencies across packages.

    (2) Breaking up shared into multiple Maven modules so now there's the
following modules:

          o shared-util
          o asn1-api
          o asn1-ber
          o ldap-model
                 - name pkg
                 - message pkg (no impl classes)
                 - schema pkg
                 - cursor pkg
                 - filter pkg
                 - entry pkg
                 - constants pkg
          o ldap-codec (not complete)

The next step would be to make these artifacts into OSGi bundles. There will
be nothing special about it. I'm just going to leverage bundle packaging to
hide implementation classes which you cannot do as easily with regular jars
with explicit package exports.

Once this is done, we can export a minimal set of classes from the codec,
hide it's remainder, and have the model interfaces be the primary dependency
used by the client API without exposing implementation classes and keeping
the API weight (surface area) down.

There's a lot more to do, the job is 40% complete. The wait for the AP merge
makes this work feel moot since the merge is going to be nasty so I might
just redo this again after Emmanuel merges. That lets me be a bit more
agressive and experimental for now.

Plus if Pierre and Seelman decide to opt for using m2eclipse+Maven+Tyco (as
Jesse mentioned) for the Studio build then these refactorings a second time
will not incur manual fixing in Studio which depends on shared now. I can
refactor Studio at the same time.


Conclusions
-----------------

So this example shows some things we can do to make things tighter and
easier for us to better manage our API's. We can do anything we like to the
implementation to fix bugs and to improve performance in point releases
without impacting the minimal interfaces we expose for the API.

We take similar steps inside the server to restrict down the exposed SPI
however using OSGi is probably not going to be an option there right away
since it gets more complicated. Here in shared I would use bundle packaging
just to hide implementation classes, not to define services etc.

Also there are some classes that were proposed for shared, i.e. DnNode which
at this point in time are specific to the server. Sure Studio might use
these classes eventually, however these classes are not generic LDAP. These
classes can stay in shared but they should be kept in a module separate from
the ldap-model for example. Why you may ask? Because these classes are not
generic LDAP classes (like Entry, or Dn, or Cursor is generic and) are not
needed by every client, nor are they viable for every server a client
connects to. They only serve a purpose when used in Studio, connecting to
ApacheDS.

DnNode might be needed by Studio in the future for making a plugin and
widget that allows users to graphically manage the boundaries of
administrative areas, however it's not something every client needs, and it
certainly is not something needed by a generic client connecting to every
server.

So things like this as well as the category of interfaces and classes used
for modeling ApacheDS specific features which also are used by Studio should
be in their own modules, if kept in shared, separate from the model or the
codec bundles. This way they can remain in shared, used by both Studio and
ApacheDS without polluting the client API.  As an example, the ACI mechanism
we use is very ApacheDS specific and is used by Studio's ACI editor. I
wanted to say X.500 specific, but we've changed our ACIs a tiny bit. So we
might have an ldap-aci module that pulls these things out of the ldap-model
so our standard client API remains clean and light, free of our ApacheDS
specific features.

The power behind this API is the number of people and projects that will use
it. We don't want the OpenDS folks for example to avoid it just because they
don't want our ApacheDS specific interfaces weighing it down and
contaminating it. I'd love to see the API used with a light footprint on
mobile devices, so footprint will matter in this odd ball case as well.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [DISCUSSION] General API & SPI Concerns

Posted by Alex Karasulu <ak...@gmail.com>.
Your 100% right it's the OSGi environment that enforces the exported  
packages and hiding the rest. So I concur with your 3 points at the end.


Sent from my iPhone

On Jan 6, 2011, at 5:21 PM, Stefan Seelmann <ma...@stefan-seelmann.de>  
wrote:

> On Thu, Jan 6, 2011 at 2:58 PM, Alex Karasulu <ak...@apache.org>  
> wrote:
>>>  In the end, dependency upon further transitive dependencies are  
>>> making us
>>>> expose almost all implementation classes in shared, and most can  
>>>> easily be
>>>> decoupled and hidden. It's effectively making everything in  
>>>> shared come
>>>> together in one big heap exposing way more than we want to.
>>>>
>>> It's quite impossible in Java to 'hide' all the classes that a  
>>> user should
>>> not manipulate. Unless you use package protected classes, and it  
>>> quickly has
>>> a limit, I would rather think in term of 'exposed' (ie documented)  
>>> API.
>>
>>
>> OSGi bundles really helps in this respect. It fills in where Java  
>> left off.
>>
>> OSGi makes it so the (bundle) packaging coincides with module  
>> boundaries. In
>> Java this is loose and there's leakage all over, as you say, it's  
>> very hard
>> to hide all implementation classes.
>>
>>
>>> That this documented API is gathered in one separate module for  
>>> convenience
>>> is another aspect, but the user will still have to depend on all  
>>> the other
>>> modules.
>>>
>>>
>> Certainly, you're right, dependencies will still exist. A codec  
>> will be
>> depended upon for it's functionality even if we do hide the  
>> implementation
>> details under the hood.
>>
>> The value add here is not from avoiding a dependency. It's from not  
>> exposing
>> more than we have to and being able to hide the implementation.  
>> This way we
>> can change the implementation at will across point releases without  
>> having
>> to bump up to a major revision.
>>
>>
>>> So all in all, should we define a module (a maven module)  
>>> containing the
>>> public API and the associated implementation ? Probably (But this  
>>> is not an
>>> absolute necessity). I guess this is what you have in mind, so  
>>> let's see
>>> what's the proposal is...
>>>
>>>
>> We have multiple options for chopping this up. With bundles we have  
>> a nice
>> tool to carve out physical not just logical boundaries to our API's  
>> and only
>> expose those packages we need to show API users.
>
>
>
>
>>
>>
>>>
>>>> LDAP Client API
>>>> ------------------------
>>>>
>>>> Everyone agrees that this API is very important to get right with  
>>>> a 1.0.
>>>> Right now this API pulls in several public interfaces directly from
>>>> shared.
>>>> Those interfaces also pull in some implementation classes. The  
>>>> logical API
>>>> extends into shared this way. Effectively the majority of shared is
>>>> exposed
>>>> by the client API. The client API does not end at it's jar  
>>>> boundary.
>>>>
>>>> All this exposure increases the chances of API change when all
>>>> implementation details are wide open and part of the client API.   
>>>> And this
>>>> is what I'm trying to limit. There are ways we can decouple these
>>>> dependencies very nicely with a mixed bag of refactoring  
>>>> techniques while
>>>> breaking up shared-ldap into lesser more coherent modules. The  
>>>> idea is to
>>>> expose the bare minimum of only what we need to expose. Yes the  
>>>> shared
>>>> code
>>>> has become very stable over time but the most stability is in the
>>>> interfaces
>>>> and if we only expose these instead of implementation classes  
>>>> then we'll
>>>> have an awesome API that may remain 1.X for a while and not require
>>>> deprecations as new functionality is introduced.
>>>>
>>>
>>> How will you limit the visibility of the modules you don't want  
>>> the user to
>>> be exposed to ?
>>>
>>>
>> A combination of refactoring techniques will be used to be able to  
>> better
>> use standard Java protection mechanisms to hide implementation  
>> details
>> combined with using OSGi bundles instead of Jars to only export those
>> packages that we do want users to see.
>
> Alex, I agree with you that separating interfaces and implemenation
> details is a good thing. Also creating OSGi bundles with a minimal set
> exported packages is a good thing. But it only helps if the bundles
> are used in an OSGI environment.
>
> I think we'll continue to deploy those OSGi bundles (which are just
> Jars with a good META-INF/MANIFEST.MF) to maven central. And each user
> using those Jars will see and can use all classes, when not using an
> OSGI environment.
>
> So I think we need additional techniques for non-OSGi users to let
> them know which packages to use, for example:
> - Use a naming convention for internal packages, the name "internal"
> is used in Eclipse and Apache Felix, not sure if is specified in OSGi.
> - Create separate Jars for API and implementation (e.g. xyz-api.jar
> and xyz-impl.jar)
>
> Kind Regards,
> Stefan

Re: [DISCUSSION] General API & SPI Concerns

Posted by Stefan Seelmann <ma...@stefan-seelmann.de>.
On Thu, Jan 6, 2011 at 2:58 PM, Alex Karasulu <ak...@apache.org> wrote:
>>  In the end, dependency upon further transitive dependencies are making us
>>> expose almost all implementation classes in shared, and most can easily be
>>> decoupled and hidden. It's effectively making everything in shared come
>>> together in one big heap exposing way more than we want to.
>>>
>> It's quite impossible in Java to 'hide' all the classes that a user should
>> not manipulate. Unless you use package protected classes, and it quickly has
>> a limit, I would rather think in term of 'exposed' (ie documented) API.
>
>
> OSGi bundles really helps in this respect. It fills in where Java left off.
>
> OSGi makes it so the (bundle) packaging coincides with module boundaries. In
> Java this is loose and there's leakage all over, as you say, it's very hard
> to hide all implementation classes.
>
>
>> That this documented API is gathered in one separate module for convenience
>> is another aspect, but the user will still have to depend on all the other
>> modules.
>>
>>
> Certainly, you're right, dependencies will still exist. A codec will be
> depended upon for it's functionality even if we do hide the implementation
> details under the hood.
>
> The value add here is not from avoiding a dependency. It's from not exposing
> more than we have to and being able to hide the implementation. This way we
> can change the implementation at will across point releases without having
> to bump up to a major revision.
>
>
>> So all in all, should we define a module (a maven module) containing the
>> public API and the associated implementation ? Probably (But this is not an
>> absolute necessity). I guess this is what you have in mind, so let's see
>> what's the proposal is...
>>
>>
> We have multiple options for chopping this up. With bundles we have a nice
> tool to carve out physical not just logical boundaries to our API's and only
> expose those packages we need to show API users.




>
>
>>
>>> LDAP Client API
>>> ------------------------
>>>
>>> Everyone agrees that this API is very important to get right with a 1.0.
>>> Right now this API pulls in several public interfaces directly from
>>> shared.
>>> Those interfaces also pull in some implementation classes. The logical API
>>> extends into shared this way. Effectively the majority of shared is
>>> exposed
>>> by the client API. The client API does not end at it's jar boundary.
>>>
>>> All this exposure increases the chances of API change when all
>>> implementation details are wide open and part of the client API.  And this
>>> is what I'm trying to limit. There are ways we can decouple these
>>> dependencies very nicely with a mixed bag of refactoring techniques while
>>> breaking up shared-ldap into lesser more coherent modules. The idea is to
>>> expose the bare minimum of only what we need to expose. Yes the shared
>>> code
>>> has become very stable over time but the most stability is in the
>>> interfaces
>>> and if we only expose these instead of implementation classes then we'll
>>> have an awesome API that may remain 1.X for a while and not require
>>> deprecations as new functionality is introduced.
>>>
>>
>> How will you limit the visibility of the modules you don't want the user to
>> be exposed to ?
>>
>>
> A combination of refactoring techniques will be used to be able to better
> use standard Java protection mechanisms to hide implementation details
> combined with using OSGi bundles instead of Jars to only export those
> packages that we do want users to see.

Alex, I agree with you that separating interfaces and implemenation
details is a good thing. Also creating OSGi bundles with a minimal set
exported packages is a good thing. But it only helps if the bundles
are used in an OSGI environment.

I think we'll continue to deploy those OSGi bundles (which are just
Jars with a good META-INF/MANIFEST.MF) to maven central. And each user
using those Jars will see and can use all classes, when not using an
OSGI environment.

So I think we need additional techniques for non-OSGi users to let
them know which packages to use, for example:
- Use a naming convention for internal packages, the name "internal"
is used in Eclipse and Apache Felix, not sure if is specified in OSGi.
- Create separate Jars for API and implementation (e.g. xyz-api.jar
and xyz-impl.jar)

Kind Regards,
Stefan

Re: [DISCUSSION] General API & SPI Concerns

Posted by Stefan Seelmann <ma...@stefan-seelmann.de>.
On Thu, Jan 6, 2011 at 3:43 PM, Emmanuel Lecharny <el...@gmail.com> wrote:
> On 1/6/11 2:58 PM, Alex Karasulu wrote:
>>>
>>> However, the schema manipulation API is in the scope of this discussion.
>>>
>>>
>> This is part of the LDAP API and is as critical as Dn, or Entry since it's
>> tied together.
>
> Damn, I overlooked this part. yes, you are totally right here, as soon as
> the LdapAPI provide Schema aware objects...
>>>
>>> Partition and configuration are not part of the Ldap API, thus are
>>> irrelevant in this discussion about shared refactoring.
>>>
>>>
>> Right this has nothing to do with shared APIs but is relavent to the
>> server.
>> The same policies in API maintenance in shared will have to be applied to
>> the server.
>
> yes, +1. I just want to discuss the server API in the server ML, to avoid
> confusion.
>>
>> </snip>
>> This is not to blame anyone. I am pointing out the problem, and pointing
>> out
>> a solution to it so we're not screwed by it. The web of dependencies in
>> shared will f**k us down the line if we don't nix em now.
>
> I'm wondering what would be the best way to get rid of those coupling... May
> be creating many maven modules (one per package) then we will immediately
> see the invalid coupling ? Or is there any tool we can use to detect the bad
> coupling ?

JDepend is such a tool that helps to detect dependencies between packages

There is a maven plugin that creates a report, see [1] for an example,
especially the "cycles" section. But please note that this report is
from November because the site generation is broken, I have to
check...

There is also an Eclipse plugin available, just search for JDepend in
the eclipse marketplace.

Kind Regards,
Stefan


[1] https://hudson.apache.org/hudson/view/A-F/view/Directory/job/dir-shared-site/site/shared-ldap/jdepend-report.html

Re: [DISCUSSION] General API & SPI Concerns

Posted by Alex Karasulu <ak...@apache.org>.
On Thu, Jan 6, 2011 at 4:43 PM, Emmanuel Lecharny <el...@gmail.com>wrote:

SNIP ...

On 1/6/11 2:58 PM, Alex Karasulu wrote:
>
>> This is not to blame anyone. I am pointing out the problem, and pointing
>>> out
>>
>>  a solution to it so we're not screwed by it. The web of dependencies in
>> shared will f**k us down the line if we don't nix em now.
>>
>
> I'm wondering what would be the best way to get rid of those coupling...
> May be creating many maven modules (one per package) then we will
> immediately see the invalid coupling ? Or is there any tool we can use to
> detect the bad coupling ?
>
>
Yeah we could do a module per package but that might be too eager.

For now let's let dependencies that we cannot remove with some simple tricks
and some common sense about what coherently goes together loosely guide our
path. But let's be relaxed about it not freaking out about module explosion
but let's not explode needlessly.

I wish one simple equation solved these things but unfortunately they don't
:(.


> I must admit I have not investigated this area yet...
>
>
No worries. I got your back here and will give y'all an update about what
was done, how and why so we're on the same page at some point and can think
together on it to finally tidy up.

Just get this AP thing worked out without worrying yourself about these
details too much. What you're doing in AP land is much more important.

 The work needed here is a joke really. The big issue with it is the impact
>> the changes in shared have all over the place in Studio and ApacheDS and
>> the
>> fact that we're better off waiting for AP work to complete to merge.
>>
> Absolutely. I know that I'm a bottleneck here, but OTOH, there is little I
> can do to move faster :/
>
>
Please don't feel rushed. Again what you're doing is one of the most nasty
areas of the server and not trivial. It would be easier and simpler writing
a web server than this region of code. So just focus on doing it right so it
does not steal any more of your time.

I'll work on this stuff and update the list. I've got some stupid things to
take care of today so I will not be as agressive until the weekend. Just a
heads up.


>
>   This might be due to eager reuse or the addition of
>>>
>>>> utility methods into codec classes for convenience. Some of these
>>>> dependencies can be removed by breaking out non-implementation specific
>>>> methods and constants in codec classes into utility methods outside of
>>>> the
>>>> package or the module all together. Furthermore the codec implementation
>>>> that handles [de]marshaling has to access package friendly (non-API)
>>>> methods
>>>> on implementation classes while encoding.
>>>>
>>>>  Not sure that I get what you mean here. Can you be a bit more explicit
>>> ?
>>>
>>
>> LdapEncoder accesses package friendly methods inside most message Impl
>> clases to encode them. This also pulls into message dependencies from
>> codec
>> which can be hidden. But these are really easy to fix. We just need to
>> know
>> that the situation is there and get rid of it.
>>
> Get it now.
>
> Btw, I still have some issues with the codec classes
> (LdapEncoder/LdapDecoder). They could be simplified, as we still live with
> some mechanisms used years ago. The Client-API codec is way simpler.
>
> We can discuss this point in a separate thread.
>
>
Sure no problem. I have some ideas here too (nothing big) just to make it so
we can hide implementation better with the code making it more pluggable.
While your working let me test the ideas out and post something about it.


>
>   In the end, dependency upon further transitive dependencies are making us
>>>
>>>> expose almost all implementation classes in shared, and most can easily
>>>> be
>>>> decoupled and hidden. It's effectively making everything in shared come
>>>> together in one big heap exposing way more than we want to.
>>>>
>>>>  It's quite impossible in Java to 'hide' all the classes that a user
>>> should
>>> not manipulate. Unless you use package protected classes, and it quickly
>>> has
>>> a limit, I would rather think in term of 'exposed' (ie documented) API.
>>>
>>
>> OSGi bundles really helps in this respect. It fills in where Java left
>> off.
>>
>> OSGi makes it so the (bundle) packaging coincides with module boundaries.
>> In
>> Java this is loose and there's leakage all over, as you say, it's very
>> hard
>> to hide all implementation classes.
>>
>
> True. I ruled out OSGi, but that may help a lot.


Yeah but as Seelman pointed out you only get that benefit in the OSGi
environment. We can do more like break things up better and use this
internal package name component.


>
>  That this documented API is gathered in one separate module for
>>> convenience
>>> is another aspect, but the user will still have to depend on all the
>>> other
>>> modules.
>>>
>>>
>>>  Certainly, you're right, dependencies will still exist. A codec will be
>> depended upon for it's functionality even if we do hide the implementation
>> details under the hood.
>>
>> The value add here is not from avoiding a dependency. It's from not
>> exposing
>> more than we have to and being able to hide the implementation. This way
>> we
>> can change the implementation at will across point releases without having
>> to bump up to a major revision.
>>
> what is important here, as you say, is to avoid exposing things that the
> user does not have to manipulate. It's noise to him.
>
>
>
>>  LDAP Client API
>>>> ------------------------
>>>>
>>>> Everyone agrees that this API is very important to get right with a 1.0.
>>>> Right now this API pulls in several public interfaces directly from
>>>> shared.
>>>> Those interfaces also pull in some implementation classes. The logical
>>>> API
>>>> extends into shared this way. Effectively the majority of shared is
>>>> exposed
>>>> by the client API. The client API does not end at it's jar boundary.
>>>>
>>>> All this exposure increases the chances of API change when all
>>>> implementation details are wide open and part of the client API.  And
>>>> this
>>>> is what I'm trying to limit. There are ways we can decouple these
>>>> dependencies very nicely with a mixed bag of refactoring techniques
>>>> while
>>>> breaking up shared-ldap into lesser more coherent modules. The idea is
>>>> to
>>>> expose the bare minimum of only what we need to expose. Yes the shared
>>>> code
>>>> has become very stable over time but the most stability is in the
>>>> interfaces
>>>> and if we only expose these instead of implementation classes then we'll
>>>> have an awesome API that may remain 1.X for a while and not require
>>>> deprecations as new functionality is introduced.
>>>>
>>>>  How will you limit the visibility of the modules you don't want the
>>> user to
>>> be exposed to ?
>>>
>>>
>>>  A combination of refactoring techniques will be used to be able to
>> better
>> use standard Java protection mechanisms to hide implementation details
>> combined with using OSGi bundles instead of Jars to only export those
>> packages that we do want users to see.
>>
>
> Let's see what it brings. I have the feeling that discussing about pros and
> cons ad nauseam will bring less light than a simple experiment. Let's be
> darwinist in this area, weak solutions will perish by lack of merit.
>
>
Perfect !


>
>  This is extremely painful to do such a cleanup without first decoupling
>>> all
>>> the pieces by creating separate jars, before regrouping the packages back
>>> again.
>>>
>>>
>>>  Why bother regrouping? We can regroup things for convenience if people
>> want
>> a single jar without deps like the shard-all thingy.
>>
>> We should not be uncomfortable having multiple modules to better decouple
>> this big hunk of code, and isolate coherent pieces as units.
>>
> My perception was that in Studio, having tens of modules were painful. We
> added the shared-all module, but most of the case, I think this creation of
> zillions of modules is quite artificial. However, at some point, it could
> help to have dedicated modules.
>
> We already have a separation by using packages, the question is how to
> correctly split the big ball of mud in smaller but useful modules.
>
> For instance, to me, it makes sense to have a separate ASN1 module, for the
> sake of hiding this detail to the user. Really, who cares about ASN.1 ? Why
> do we have to expose the ASN1 classes in the ldap-api ? So this is a valid
> reason.
>

We don't need to yeah but it's nice to pull it out to break up dependencies
and hide implementation classes. But somehow I see many ASN.1 things often
being needed in the server even in higher levels.


>
> Another example of good separation is the DSML module. It's not part of the
> core LDAP api, and if I, as a user, don't do DSML, why should I be forced to
> include it in my dependencies? This is also a valid reason for having DSML
> be a separate module.
>
>
+1


> OTOH, and we probably went to far back in september, we don't force the
> user to declare a dependence on either a big shared-all with many parts he
> is not interested in, or many dependences on many small jars.
>
> We have to find some balance here, and the suggested separation (in your
> first mail) is probably making a lot of sense (see below).
>
>
Cool


>
>  The question here is more to know how far we want to go, considering that
>>> shared contains 900 classes, more than 5600 methods and around 80
>>> packages.
>>>
>>>
>>>  Yep it's big but the problem here is not massive. It starts slowly
>> solving a
>> couple things and once you decouple a few things, decoupling others
>> becomes
>> much much easier and a layout to all of it starts falling out nicely,
>> which
>> shows even if we dumped here and created some cleanup issues for ourselves
>> the overall code really was written well.
>>
>
> Tooling could help. I don't know which tool exactly, but this is an area we
> never really explored.
>
>
I use IDEA's stuff for refactoring and code analysis. IDEA does it much
better than eclipse. Then I switch to eclipse for regular coding. I might
just stay in IDEA from now on - digging IDEA 10 it's really fast.


>
>>       (2) Breaking up shared into multiple Maven modules so now there's
>>> the
>>>
>>>> following modules:
>>>>
>>>>           o shared-util
>>>>           o asn1-api
>>>>           o asn1-ber
>>>>           o ldap-model
>>>>                  - name pkg
>>>>                  - message pkg (no impl classes)
>>>>                  - schema pkg
>>>>                  - cursor pkg
>>>>                  - filter pkg
>>>>                  - entry pkg
>>>>                  - constants pkg
>>>>           o ldap-codec (not complete)
>>>>
>>>>  I would not have 2 maven modules for asn1. It's probably overkilling. I
>>> would rather name the ldap-model ldap-api, because this is exactly what
>>> it
>>> is.
>>>
>>>  There are reasons for this to be able to get the codec to be separable.
>> Once
>> you get in and play with the little non-important details you'll probably
>> come to the same conclusion yourself.
>>
> In fact, we have ber and der codec. I'm not sure I want to expose that in
> LDAP. If we have used an asn1 compiler to generate the codecs, then yes, we
> would have had 3 modules : asn1-api, asn1-ber and asn1-der. Plus the
> generated codec.
>
> I don't know if it worths the effort here. Exposing a monolithic asn1
> module should not be a big issue. It won't change.
>
>
I'm going off how well I can break up dependencies here - thats why I
created asn1-api and asn1-ber (maybe should have called it asn1-impl). But
we have the option of consolidation later on once we can look at the
dependencies between modules when all the dependency cleanups are done.

I think then we'll have a better picture of what should be grouped together.
For now let's just cleanup as best as possible then see how the independent
blocks can be put together.


>  Let me propose a methodology we can follow here to speed things up without
>> needlessly arguing each point because in the end we do in fact come to the
>> same conclusions.
>>
>> Let's just relax while decoupling about the number or name of modules. The
>> first pass should be about breaking up dependencies to hide the
>> implementation details so we're free to solve these aggressively without
>> inhibitions.
>>
>> Then once we see a clear dependency between modules, we can take another
>> pass at consolidation as a separate concern and discussion. Until we see
>> the
>> real dependency picture fall out from refactoring it's moot over
>> discussing
>> it.
>>
>
> Ok, I buy that. As I said earlier, discussion is good to have, but it's not
> as valid as action. We can move back and forth with the code as a base for a
> further discussion anyway.


+1


>
>  They are helper classes. They certainly don't belongs to
>>> ldap-model/ldap-api, and if they have to stay in shared, I would like to
>>> move it to utils.
>>>
>>>
>>>  We discussed this last night. Just wanted to point out the clarification
>> you
>> made to me.
>>
>> Shared will have shared-utils for generic utility classes that can be used
>> by anything not just LDAP code. Then there may be a shared-ldap-util but I
>> think this might be overkill and not such a good idea: let me explain
>> technically why:
>>
>> If we dump these ldap specific utility classes into an ldap-util, then a
>> dependency to one util class pulls in utility classes in the rest of the
>> module increasing footprint perhaps needlessly.
>>
>> What is needed for minimal generic client operation should be kept
>> together
>> with as little dependencies as possible. No need to expose the plethora of
>> utility classes we have amassed in there.
>>
>> Yes they are very useful utilities but it's not about packaging freebees
>> it's about keeping things small, tight, and minimizing exposure. We can
>> package these things into a separate jar and only use them in studio and
>> in
>> apacheds.
>>
>> So in conclusion what I am saying is the general formula we have become
>> accustomed to where we throw all utility classes into one module no longer
>> works blindly for us. We need to think about what needless dependencies
>> and
>> classes this is including.
>>
>
> Here, again, we need action now. Enough discussion, let's move to code. We
> can then discuss the pros and cons later, and iterate.
>
>
Excellent - we're on the same base.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [DISCUSSION] General API & SPI Concerns

Posted by Alex Karasulu <ak...@apache.org>.
On Thu, Jan 6, 2011 at 4:43 PM, Emmanuel Lecharny <el...@gmail.com>wrote:

SNIP ...

On 1/6/11 2:58 PM, Alex Karasulu wrote:
>
>> This is not to blame anyone. I am pointing out the problem, and pointing
>>> out
>>
>>  a solution to it so we're not screwed by it. The web of dependencies in
>> shared will f**k us down the line if we don't nix em now.
>>
>
> I'm wondering what would be the best way to get rid of those coupling...
> May be creating many maven modules (one per package) then we will
> immediately see the invalid coupling ? Or is there any tool we can use to
> detect the bad coupling ?
>
>
Yeah we could do a module per package but that might be too eager.

For now let's let dependencies that we cannot remove with some simple tricks
and some common sense about what coherently goes together loosely guide our
path. But let's be relaxed about it not freaking out about module explosion
but let's not explode needlessly.

I wish one simple equation solved these things but unfortunately they don't
:(.


> I must admit I have not investigated this area yet...
>
>
No worries. I got your back here and will give y'all an update about what
was done, how and why so we're on the same page at some point and can think
together on it to finally tidy up.

Just get this AP thing worked out without worrying yourself about these
details too much. What you're doing in AP land is much more important.

 The work needed here is a joke really. The big issue with it is the impact
>> the changes in shared have all over the place in Studio and ApacheDS and
>> the
>> fact that we're better off waiting for AP work to complete to merge.
>>
> Absolutely. I know that I'm a bottleneck here, but OTOH, there is little I
> can do to move faster :/
>
>
Please don't feel rushed. Again what you're doing is one of the most nasty
areas of the server and not trivial. It would be easier and simpler writing
a web server than this region of code. So just focus on doing it right so it
does not steal any more of your time.

I'll work on this stuff and update the list. I've got some stupid things to
take care of today so I will not be as agressive until the weekend. Just a
heads up.


>
>   This might be due to eager reuse or the addition of
>>>
>>>> utility methods into codec classes for convenience. Some of these
>>>> dependencies can be removed by breaking out non-implementation specific
>>>> methods and constants in codec classes into utility methods outside of
>>>> the
>>>> package or the module all together. Furthermore the codec implementation
>>>> that handles [de]marshaling has to access package friendly (non-API)
>>>> methods
>>>> on implementation classes while encoding.
>>>>
>>>>  Not sure that I get what you mean here. Can you be a bit more explicit
>>> ?
>>>
>>
>> LdapEncoder accesses package friendly methods inside most message Impl
>> clases to encode them. This also pulls into message dependencies from
>> codec
>> which can be hidden. But these are really easy to fix. We just need to
>> know
>> that the situation is there and get rid of it.
>>
> Get it now.
>
> Btw, I still have some issues with the codec classes
> (LdapEncoder/LdapDecoder). They could be simplified, as we still live with
> some mechanisms used years ago. The Client-API codec is way simpler.
>
> We can discuss this point in a separate thread.
>
>
Sure no problem. I have some ideas here too (nothing big) just to make it so
we can hide implementation better with the code making it more pluggable.
While your working let me test the ideas out and post something about it.


>
>   In the end, dependency upon further transitive dependencies are making us
>>>
>>>> expose almost all implementation classes in shared, and most can easily
>>>> be
>>>> decoupled and hidden. It's effectively making everything in shared come
>>>> together in one big heap exposing way more than we want to.
>>>>
>>>>  It's quite impossible in Java to 'hide' all the classes that a user
>>> should
>>> not manipulate. Unless you use package protected classes, and it quickly
>>> has
>>> a limit, I would rather think in term of 'exposed' (ie documented) API.
>>>
>>
>> OSGi bundles really helps in this respect. It fills in where Java left
>> off.
>>
>> OSGi makes it so the (bundle) packaging coincides with module boundaries.
>> In
>> Java this is loose and there's leakage all over, as you say, it's very
>> hard
>> to hide all implementation classes.
>>
>
> True. I ruled out OSGi, but that may help a lot.


Yeah but as Seelman pointed out you only get that benefit in the OSGi
environment. We can do more like break things up better and use this
internal package name component.


>
>  That this documented API is gathered in one separate module for
>>> convenience
>>> is another aspect, but the user will still have to depend on all the
>>> other
>>> modules.
>>>
>>>
>>>  Certainly, you're right, dependencies will still exist. A codec will be
>> depended upon for it's functionality even if we do hide the implementation
>> details under the hood.
>>
>> The value add here is not from avoiding a dependency. It's from not
>> exposing
>> more than we have to and being able to hide the implementation. This way
>> we
>> can change the implementation at will across point releases without having
>> to bump up to a major revision.
>>
> what is important here, as you say, is to avoid exposing things that the
> user does not have to manipulate. It's noise to him.
>
>
>
>>  LDAP Client API
>>>> ------------------------
>>>>
>>>> Everyone agrees that this API is very important to get right with a 1.0.
>>>> Right now this API pulls in several public interfaces directly from
>>>> shared.
>>>> Those interfaces also pull in some implementation classes. The logical
>>>> API
>>>> extends into shared this way. Effectively the majority of shared is
>>>> exposed
>>>> by the client API. The client API does not end at it's jar boundary.
>>>>
>>>> All this exposure increases the chances of API change when all
>>>> implementation details are wide open and part of the client API.  And
>>>> this
>>>> is what I'm trying to limit. There are ways we can decouple these
>>>> dependencies very nicely with a mixed bag of refactoring techniques
>>>> while
>>>> breaking up shared-ldap into lesser more coherent modules. The idea is
>>>> to
>>>> expose the bare minimum of only what we need to expose. Yes the shared
>>>> code
>>>> has become very stable over time but the most stability is in the
>>>> interfaces
>>>> and if we only expose these instead of implementation classes then we'll
>>>> have an awesome API that may remain 1.X for a while and not require
>>>> deprecations as new functionality is introduced.
>>>>
>>>>  How will you limit the visibility of the modules you don't want the
>>> user to
>>> be exposed to ?
>>>
>>>
>>>  A combination of refactoring techniques will be used to be able to
>> better
>> use standard Java protection mechanisms to hide implementation details
>> combined with using OSGi bundles instead of Jars to only export those
>> packages that we do want users to see.
>>
>
> Let's see what it brings. I have the feeling that discussing about pros and
> cons ad nauseam will bring less light than a simple experiment. Let's be
> darwinist in this area, weak solutions will perish by lack of merit.
>
>
Perfect !


>
>  This is extremely painful to do such a cleanup without first decoupling
>>> all
>>> the pieces by creating separate jars, before regrouping the packages back
>>> again.
>>>
>>>
>>>  Why bother regrouping? We can regroup things for convenience if people
>> want
>> a single jar without deps like the shard-all thingy.
>>
>> We should not be uncomfortable having multiple modules to better decouple
>> this big hunk of code, and isolate coherent pieces as units.
>>
> My perception was that in Studio, having tens of modules were painful. We
> added the shared-all module, but most of the case, I think this creation of
> zillions of modules is quite artificial. However, at some point, it could
> help to have dedicated modules.
>
> We already have a separation by using packages, the question is how to
> correctly split the big ball of mud in smaller but useful modules.
>
> For instance, to me, it makes sense to have a separate ASN1 module, for the
> sake of hiding this detail to the user. Really, who cares about ASN.1 ? Why
> do we have to expose the ASN1 classes in the ldap-api ? So this is a valid
> reason.
>

We don't need to yeah but it's nice to pull it out to break up dependencies
and hide implementation classes. But somehow I see many ASN.1 things often
being needed in the server even in higher levels.


>
> Another example of good separation is the DSML module. It's not part of the
> core LDAP api, and if I, as a user, don't do DSML, why should I be forced to
> include it in my dependencies? This is also a valid reason for having DSML
> be a separate module.
>
>
+1


> OTOH, and we probably went to far back in september, we don't force the
> user to declare a dependence on either a big shared-all with many parts he
> is not interested in, or many dependences on many small jars.
>
> We have to find some balance here, and the suggested separation (in your
> first mail) is probably making a lot of sense (see below).
>
>
Cool


>
>  The question here is more to know how far we want to go, considering that
>>> shared contains 900 classes, more than 5600 methods and around 80
>>> packages.
>>>
>>>
>>>  Yep it's big but the problem here is not massive. It starts slowly
>> solving a
>> couple things and once you decouple a few things, decoupling others
>> becomes
>> much much easier and a layout to all of it starts falling out nicely,
>> which
>> shows even if we dumped here and created some cleanup issues for ourselves
>> the overall code really was written well.
>>
>
> Tooling could help. I don't know which tool exactly, but this is an area we
> never really explored.
>
>
I use IDEA's stuff for refactoring and code analysis. IDEA does it much
better than eclipse. Then I switch to eclipse for regular coding. I might
just stay in IDEA from now on - digging IDEA 10 it's really fast.


>
>>       (2) Breaking up shared into multiple Maven modules so now there's
>>> the
>>>
>>>> following modules:
>>>>
>>>>           o shared-util
>>>>           o asn1-api
>>>>           o asn1-ber
>>>>           o ldap-model
>>>>                  - name pkg
>>>>                  - message pkg (no impl classes)
>>>>                  - schema pkg
>>>>                  - cursor pkg
>>>>                  - filter pkg
>>>>                  - entry pkg
>>>>                  - constants pkg
>>>>           o ldap-codec (not complete)
>>>>
>>>>  I would not have 2 maven modules for asn1. It's probably overkilling. I
>>> would rather name the ldap-model ldap-api, because this is exactly what
>>> it
>>> is.
>>>
>>>  There are reasons for this to be able to get the codec to be separable.
>> Once
>> you get in and play with the little non-important details you'll probably
>> come to the same conclusion yourself.
>>
> In fact, we have ber and der codec. I'm not sure I want to expose that in
> LDAP. If we have used an asn1 compiler to generate the codecs, then yes, we
> would have had 3 modules : asn1-api, asn1-ber and asn1-der. Plus the
> generated codec.
>
> I don't know if it worths the effort here. Exposing a monolithic asn1
> module should not be a big issue. It won't change.
>
>
I'm going off how well I can break up dependencies here - thats why I
created asn1-api and asn1-ber (maybe should have called it asn1-impl). But
we have the option of consolidation later on once we can look at the
dependencies between modules when all the dependency cleanups are done.

I think then we'll have a better picture of what should be grouped together.
For now let's just cleanup as best as possible then see how the independent
blocks can be put together.


>  Let me propose a methodology we can follow here to speed things up without
>> needlessly arguing each point because in the end we do in fact come to the
>> same conclusions.
>>
>> Let's just relax while decoupling about the number or name of modules. The
>> first pass should be about breaking up dependencies to hide the
>> implementation details so we're free to solve these aggressively without
>> inhibitions.
>>
>> Then once we see a clear dependency between modules, we can take another
>> pass at consolidation as a separate concern and discussion. Until we see
>> the
>> real dependency picture fall out from refactoring it's moot over
>> discussing
>> it.
>>
>
> Ok, I buy that. As I said earlier, discussion is good to have, but it's not
> as valid as action. We can move back and forth with the code as a base for a
> further discussion anyway.


+1


>
>  They are helper classes. They certainly don't belongs to
>>> ldap-model/ldap-api, and if they have to stay in shared, I would like to
>>> move it to utils.
>>>
>>>
>>>  We discussed this last night. Just wanted to point out the clarification
>> you
>> made to me.
>>
>> Shared will have shared-utils for generic utility classes that can be used
>> by anything not just LDAP code. Then there may be a shared-ldap-util but I
>> think this might be overkill and not such a good idea: let me explain
>> technically why:
>>
>> If we dump these ldap specific utility classes into an ldap-util, then a
>> dependency to one util class pulls in utility classes in the rest of the
>> module increasing footprint perhaps needlessly.
>>
>> What is needed for minimal generic client operation should be kept
>> together
>> with as little dependencies as possible. No need to expose the plethora of
>> utility classes we have amassed in there.
>>
>> Yes they are very useful utilities but it's not about packaging freebees
>> it's about keeping things small, tight, and minimizing exposure. We can
>> package these things into a separate jar and only use them in studio and
>> in
>> apacheds.
>>
>> So in conclusion what I am saying is the general formula we have become
>> accustomed to where we throw all utility classes into one module no longer
>> works blindly for us. We need to think about what needless dependencies
>> and
>> classes this is including.
>>
>
> Here, again, we need action now. Enough discussion, let's move to code. We
> can then discuss the pros and cons later, and iterate.
>
>
Excellent - we're on the same base.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [DISCUSSION] General API & SPI Concerns

Posted by Emmanuel Lecharny <el...@gmail.com>.
On 1/6/11 2:58 PM, Alex Karasulu wrote:
>> However, the schema manipulation API is in the scope of this discussion.
>>
>>
> This is part of the LDAP API and is as critical as Dn, or Entry since it's
> tied together.
Damn, I overlooked this part. yes, you are totally right here, as soon 
as the LdapAPI provide Schema aware objects...
>> Partition and configuration are not part of the Ldap API, thus are
>> irrelevant in this discussion about shared refactoring.
>>
>>
> Right this has nothing to do with shared APIs but is relavent to the server.
> The same policies in API maintenance in shared will have to be applied to
> the server.
yes, +1. I just want to discuss the server API in the server ML, to 
avoid confusion.
> </snip>
> This is not to blame anyone. I am pointing out the problem, and pointing out
> a solution to it so we're not screwed by it. The web of dependencies in
> shared will f**k us down the line if we don't nix em now.

I'm wondering what would be the best way to get rid of those coupling... 
May be creating many maven modules (one per package) then we will 
immediately see the invalid coupling ? Or is there any tool we can use 
to detect the bad coupling ?

I must admit I have not investigated this area yet...
>>> implementation classes depend on codec functionality which is an
>>> implementation detail.
>>>
>> Not true anymore (or is it?).
>>
>>
> Yeah there's some residual dependencies but not a big deal to fix. Trivial
> stuff.
We then have to fix them.

> The work needed here is a joke really. The big issue with it is the impact
> the changes in shared have all over the place in Studio and ApacheDS and the
> fact that we're better off waiting for AP work to complete to merge.
Absolutely. I know that I'm a bottleneck here, but OTOH, there is little 
I can do to move faster :/

>>   This might be due to eager reuse or the addition of
>>> utility methods into codec classes for convenience. Some of these
>>> dependencies can be removed by breaking out non-implementation specific
>>> methods and constants in codec classes into utility methods outside of the
>>> package or the module all together. Furthermore the codec implementation
>>> that handles [de]marshaling has to access package friendly (non-API)
>>> methods
>>> on implementation classes while encoding.
>>>
>> Not sure that I get what you mean here. Can you be a bit more explicit ?
>
> LdapEncoder accesses package friendly methods inside most message Impl
> clases to encode them. This also pulls into message dependencies from codec
> which can be hidden. But these are really easy to fix. We just need to know
> that the situation is there and get rid of it.
Get it now.

Btw, I still have some issues with the codec classes 
(LdapEncoder/LdapDecoder). They could be simplified, as we still live 
with some mechanisms used years ago. The Client-API codec is way simpler.

We can discuss this point in a separate thread.

>>   In the end, dependency upon further transitive dependencies are making us
>>> expose almost all implementation classes in shared, and most can easily be
>>> decoupled and hidden. It's effectively making everything in shared come
>>> together in one big heap exposing way more than we want to.
>>>
>> It's quite impossible in Java to 'hide' all the classes that a user should
>> not manipulate. Unless you use package protected classes, and it quickly has
>> a limit, I would rather think in term of 'exposed' (ie documented) API.
>
> OSGi bundles really helps in this respect. It fills in where Java left off.
>
> OSGi makes it so the (bundle) packaging coincides with module boundaries. In
> Java this is loose and there's leakage all over, as you say, it's very hard
> to hide all implementation classes.

True. I ruled out OSGi, but that may help a lot.
>> That this documented API is gathered in one separate module for convenience
>> is another aspect, but the user will still have to depend on all the other
>> modules.
>>
>>
> Certainly, you're right, dependencies will still exist. A codec will be
> depended upon for it's functionality even if we do hide the implementation
> details under the hood.
>
> The value add here is not from avoiding a dependency. It's from not exposing
> more than we have to and being able to hide the implementation. This way we
> can change the implementation at will across point releases without having
> to bump up to a major revision.
what is important here, as you say, is to avoid exposing things that the 
user does not have to manipulate. It's noise to him.

>
>>> LDAP Client API
>>> ------------------------
>>>
>>> Everyone agrees that this API is very important to get right with a 1.0.
>>> Right now this API pulls in several public interfaces directly from
>>> shared.
>>> Those interfaces also pull in some implementation classes. The logical API
>>> extends into shared this way. Effectively the majority of shared is
>>> exposed
>>> by the client API. The client API does not end at it's jar boundary.
>>>
>>> All this exposure increases the chances of API change when all
>>> implementation details are wide open and part of the client API.  And this
>>> is what I'm trying to limit. There are ways we can decouple these
>>> dependencies very nicely with a mixed bag of refactoring techniques while
>>> breaking up shared-ldap into lesser more coherent modules. The idea is to
>>> expose the bare minimum of only what we need to expose. Yes the shared
>>> code
>>> has become very stable over time but the most stability is in the
>>> interfaces
>>> and if we only expose these instead of implementation classes then we'll
>>> have an awesome API that may remain 1.X for a while and not require
>>> deprecations as new functionality is introduced.
>>>
>> How will you limit the visibility of the modules you don't want the user to
>> be exposed to ?
>>
>>
> A combination of refactoring techniques will be used to be able to better
> use standard Java protection mechanisms to hide implementation details
> combined with using OSGi bundles instead of Jars to only export those
> packages that we do want users to see.

Let's see what it brings. I have the feeling that discussing about pros 
and cons ad nauseam will bring less light than a simple experiment. 
Let's be darwinist in this area, weak solutions will perish by lack of 
merit.

>> This is extremely painful to do such a cleanup without first decoupling all
>> the pieces by creating separate jars, before regrouping the packages back
>> again.
>>
>>
> Why bother regrouping? We can regroup things for convenience if people want
> a single jar without deps like the shard-all thingy.
>
> We should not be uncomfortable having multiple modules to better decouple
> this big hunk of code, and isolate coherent pieces as units.
My perception was that in Studio, having tens of modules were painful. 
We added the shared-all module, but most of the case, I think this 
creation of zillions of modules is quite artificial. However, at some 
point, it could help to have dedicated modules.

We already have a separation by using packages, the question is how to 
correctly split the big ball of mud in smaller but useful modules.

For instance, to me, it makes sense to have a separate ASN1 module, for 
the sake of hiding this detail to the user. Really, who cares about 
ASN.1 ? Why do we have to expose the ASN1 classes in the ldap-api ? So 
this is a valid reason.

Another example of good separation is the DSML module. It's not part of 
the core LDAP api, and if I, as a user, don't do DSML, why should I be 
forced to include it in my dependencies? This is also a valid reason for 
having DSML be a separate module.

OTOH, and we probably went to far back in september, we don't force the 
user to declare a dependence on either a big shared-all with many parts 
he is not interested in, or many dependences on many small jars.

We have to find some balance here, and the suggested separation (in your 
first mail) is probably making a lot of sense (see below).

>> The question here is more to know how far we want to go, considering that
>> shared contains 900 classes, more than 5600 methods and around 80 packages.
>>
>>
> Yep it's big but the problem here is not massive. It starts slowly solving a
> couple things and once you decouple a few things, decoupling others becomes
> much much easier and a layout to all of it starts falling out nicely, which
> shows even if we dumped here and created some cleanup issues for ourselves
> the overall code really was written well.

Tooling could help. I don't know which tool exactly, but this is an area 
we never really explored.
>
>>       (2) Breaking up shared into multiple Maven modules so now there's the
>>> following modules:
>>>
>>>            o shared-util
>>>            o asn1-api
>>>            o asn1-ber
>>>            o ldap-model
>>>                   - name pkg
>>>                   - message pkg (no impl classes)
>>>                   - schema pkg
>>>                   - cursor pkg
>>>                   - filter pkg
>>>                   - entry pkg
>>>                   - constants pkg
>>>            o ldap-codec (not complete)
>>>
>> I would not have 2 maven modules for asn1. It's probably overkilling. I
>> would rather name the ldap-model ldap-api, because this is exactly what it
>> is.
>>
> There are reasons for this to be able to get the codec to be separable. Once
> you get in and play with the little non-important details you'll probably
> come to the same conclusion yourself.
In fact, we have ber and der codec. I'm not sure I want to expose that 
in LDAP. If we have used an asn1 compiler to generate the codecs, then 
yes, we would have had 3 modules : asn1-api, asn1-ber and asn1-der. Plus 
the generated codec.

I don't know if it worths the effort here. Exposing a monolithic asn1 
module should not be a big issue. It won't change.

> Let me propose a methodology we can follow here to speed things up without
> needlessly arguing each point because in the end we do in fact come to the
> same conclusions.
>
> Let's just relax while decoupling about the number or name of modules. The
> first pass should be about breaking up dependencies to hide the
> implementation details so we're free to solve these aggressively without
> inhibitions.
>
> Then once we see a clear dependency between modules, we can take another
> pass at consolidation as a separate concern and discussion. Until we see the
> real dependency picture fall out from refactoring it's moot over discussing
> it.

Ok, I buy that. As I said earlier, discussion is good to have, but it's 
not as valid as action. We can move back and forth with the code as a 
base for a further discussion anyway.
>> They are helper classes. They certainly don't belongs to
>> ldap-model/ldap-api, and if they have to stay in shared, I would like to
>> move it to utils.
>>
>>
> We discussed this last night. Just wanted to point out the clarification you
> made to me.
>
> Shared will have shared-utils for generic utility classes that can be used
> by anything not just LDAP code. Then there may be a shared-ldap-util but I
> think this might be overkill and not such a good idea: let me explain
> technically why:
>
> If we dump these ldap specific utility classes into an ldap-util, then a
> dependency to one util class pulls in utility classes in the rest of the
> module increasing footprint perhaps needlessly.
>
> What is needed for minimal generic client operation should be kept together
> with as little dependencies as possible. No need to expose the plethora of
> utility classes we have amassed in there.
>
> Yes they are very useful utilities but it's not about packaging freebees
> it's about keeping things small, tight, and minimizing exposure. We can
> package these things into a separate jar and only use them in studio and in
> apacheds.
>
> So in conclusion what I am saying is the general formula we have become
> accustomed to where we throw all utility classes into one module no longer
> works blindly for us. We need to think about what needless dependencies and
> classes this is including.

Here, again, we need action now. Enough discussion, let's move to code. 
We can then discuss the pros and cons later, and iterate.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: [DISCUSSION] General API & SPI Concerns

Posted by Alex Karasulu <ak...@apache.org>.
On Thu, Jan 6, 2011 at 4:49 AM, Emmanuel Lecharny <el...@gmail.com>wrote:

> On 1/6/11 2:36 AM, Alex Karasulu wrote:
>
>> Hi all,
>>
>> Excuse the cross post but this also has significance to the API list.
>>
>> Problem
>> ------------
>>
>> For our benefit and the benefit of our users we need to be uber careful
>> with
>> changes after a major GA release. We have another thread where it seems
>> people agree with the Eclipse scheme of versioning and this sounds really
>> flexible for our needs. We can do a 2.0.0-M1 release at any time without
>> clamping down on API's. Only when we do a RC do we have to freeze changes
>> to
>> interfaces.
>>
>> The debate still remains as to what constitues an interface. Emmanuel
>> seems
>> to disagree with configuration, schema, and partition db formats as being
>> interfaces of concern but for the time being we can just discuss those we
>> do
>> agree on. There's no doubt about APIs and SPIs.
>>
>
> I don't disagree with Schema, but Schema are clearly defined by RFCs, there
> is no possible interpretation about their syntax and definition.


Absolutely I do agree with you. I was thinking there's bug or mistake we
find with our core published schema or we change the ApacheDS meta schema.
In this case we'd need to bump up to a major version because then just a
software update will not solve problems with already created entries on disk
using the older schema. The situation will be undefined - very hard to
predict.


> However, the schema manipulation API is in the scope of this discussion.
>
>
This is part of the LDAP API and is as critical as Dn, or Entry since it's
tied together.


> Partition and configuration are not part of the Ldap API, thus are
> irrelevant in this discussion about shared refactoring.
>
>
Right this has nothing to do with shared APIs but is relavent to the server.
The same policies in API maintenance in shared will have to be applied to
the server.

To me anything exposed is something to consider for backwards compatibility,
not talking just API here. Whether it's LDAP extended operations, web
services, or database formats, these things impact backwards compatibility.


>
>  Solution
>> ------------
>>
>> So how do we make this as painless to us and users as much as is possible?
>> The best way is to keep the surface area of the SPI or API small, create
>> solid boundaries, and avoid exposing implementation details and
>> implementation classes.
>>
>> By reducing the surface area with implementation hiding we can effectively
>> limit exposure and reduce the probability of needing to make a change that
>> breaks with our user contract. You might be asking what's a real world
>> example of this for us in shared?
>>
>> And incidentally this is one of the things I've been working on in my
>> branch.
>>
>>
>> Real World Example in Shared
>> --------------------------------------------
>>
>> Let's take the o.a.d.s.ldap.message package as an example. This package
>> contains classes and interfaces modeling LDAP requests and responses: i.e.
>> AddRequest, DeleteResponse etc. It's in the shared-ldap module.
>>
>> In this package, in addition to request response interfaces, we're
>> exposing
>> implementation classes for them. The implementation classes, in turn have
>> dependencies on o.a.d.s.ldap.codec.* packages.
>>
> Not any more, I hope. We did a big refactoring last september in order to
> remove this coupling. Of course, we may have some remaining dependencies,
> but this is more or less not intentional.
>
>
Right not intentional and this is just one example in many. Look we've all
used shared as a dumping ground. While our primary focus was a tough problem
in Studio or in ApacheDS we put minimal energy into shared as we deposited
some classes and interfaces into it. This is because the main focus was
something else.

This is not to blame anyone. I am pointing out the problem, and pointing out
a solution to it so we're not screwed by it. The web of dependencies in
shared will f**k us down the line if we don't nix em now.


>
>   This is because some
>> implementation classes depend on codec functionality which is an
>> implementation detail.
>>
>
> Not true anymore (or is it?).
>
>
Yeah there's some residual dependencies but not a big deal to fix. Trivial
stuff.

The work needed here is a joke really. The big issue with it is the impact
the changes in shared have all over the place in Studio and ApacheDS and the
fact that we're better off waiting for AP work to complete to merge.


>  This might be due to eager reuse or the addition of
>> utility methods into codec classes for convenience. Some of these
>> dependencies can be removed by breaking out non-implementation specific
>> methods and constants in codec classes into utility methods outside of the
>> package or the module all together. Furthermore the codec implementation
>> that handles [de]marshaling has to access package friendly (non-API)
>> methods
>> on implementation classes while encoding.
>>
> Not sure that I get what you mean here. Can you be a bit more explicit ?


LdapEncoder accesses package friendly methods inside most message Impl
clases to encode them. This also pulls into message dependencies from codec
which can be hidden. But these are really easy to fix. We just need to know
that the situation is there and get rid of it.


>
>  In the end, dependency upon further transitive dependencies are making us
>> expose almost all implementation classes in shared, and most can easily be
>> decoupled and hidden. It's effectively making everything in shared come
>> together in one big heap exposing way more than we want to.
>>
> It's quite impossible in Java to 'hide' all the classes that a user should
> not manipulate. Unless you use package protected classes, and it quickly has
> a limit, I would rather think in term of 'exposed' (ie documented) API.


OSGi bundles really helps in this respect. It fills in where Java left off.

OSGi makes it so the (bundle) packaging coincides with module boundaries. In
Java this is loose and there's leakage all over, as you say, it's very hard
to hide all implementation classes.


> That this documented API is gathered in one separate module for convenience
> is another aspect, but the user will still have to depend on all the other
> modules.
>
>
Certainly, you're right, dependencies will still exist. A codec will be
depended upon for it's functionality even if we do hide the implementation
details under the hood.

The value add here is not from avoiding a dependency. It's from not exposing
more than we have to and being able to hide the implementation. This way we
can change the implementation at will across point releases without having
to bump up to a major revision.


> So all in all, should we define a module (a maven module) containing the
> public API and the associated implementation ? Probably (But this is not an
> absolute necessity). I guess this is what you have in mind, so let's see
> what's the proposal is...
>
>
We have multiple options for chopping this up. With bundles we have a nice
tool to carve out physical not just logical boundaries to our API's and only
expose those packages we need to show API users.


>
>> LDAP Client API
>> ------------------------
>>
>> Everyone agrees that this API is very important to get right with a 1.0.
>> Right now this API pulls in several public interfaces directly from
>> shared.
>> Those interfaces also pull in some implementation classes. The logical API
>> extends into shared this way. Effectively the majority of shared is
>> exposed
>> by the client API. The client API does not end at it's jar boundary.
>>
>> All this exposure increases the chances of API change when all
>> implementation details are wide open and part of the client API.  And this
>> is what I'm trying to limit. There are ways we can decouple these
>> dependencies very nicely with a mixed bag of refactoring techniques while
>> breaking up shared-ldap into lesser more coherent modules. The idea is to
>> expose the bare minimum of only what we need to expose. Yes the shared
>> code
>> has become very stable over time but the most stability is in the
>> interfaces
>> and if we only expose these instead of implementation classes then we'll
>> have an awesome API that may remain 1.X for a while and not require
>> deprecations as new functionality is introduced.
>>
>
> How will you limit the visibility of the modules you don't want the user to
> be exposed to ?
>
>
A combination of refactoring techniques will be used to be able to better
use standard Java protection mechanisms to hide implementation details
combined with using OSGi bundles instead of Jars to only export those
packages that we do want users to see.


>  Finishing Up the Example
>> -------------------------------------
>>
>> So what concrete things can we do?
>>
>> The biggest step is to hide as many of the implementation classes as
>> possible. In my experimental branch I started by:
>>
>>     (1) Moving out methods and constants in codec classes causing
>> unnecessary dependencies from message package classes and interfaces.
>> There
>> was a situation even where StringTools for example depended on codec
>> classes, and virtually everything doing string related operations used
>> StringTools there by causing man interdependencies. It then becomes a web
>> of
>> dependencies across packages.
>>
> There is *one* method in StringTools that calls a codec method :
> Hex.encodeHex. It's a mistake, as we already have another StringTools method
> (toHexString) doing the same thing (to be double chekced). This is typically
> a wrong usage of a class from a wrong package, and we should get rid of such
> coupling.
>

Yep, no big deal something that gets fixed in seconds but we have a few of
these kinds of examples. One by one they're nothing but all together they
create this web making almost everything dependent transitively on each
other. This is easy to fix.


>
> This is extremely painful to do such a cleanup without first decoupling all
> the pieces by creating separate jars, before regrouping the packages back
> again.
>
>
Why bother regrouping? We can regroup things for convenience if people want
a single jar without deps like the shard-all thingy.

We should not be uncomfortable having multiple modules to better decouple
this big hunk of code, and isolate coherent pieces as units.


> The question here is more to know how far we want to go, considering that
> shared contains 900 classes, more than 5600 methods and around 80 packages.
>
>
Yep it's big but the problem here is not massive. It starts slowly solving a
couple things and once you decouple a few things, decoupling others becomes
much much easier and a layout to all of it starts falling out nicely, which
shows even if we dumped here and created some cleanup issues for ourselves
the overall code really was written well.


>      (2) Breaking up shared into multiple Maven modules so now there's the
>> following modules:
>>
>>           o shared-util
>>           o asn1-api
>>           o asn1-ber
>>           o ldap-model
>>                  - name pkg
>>                  - message pkg (no impl classes)
>>                  - schema pkg
>>                  - cursor pkg
>>                  - filter pkg
>>                  - entry pkg
>>                  - constants pkg
>>           o ldap-codec (not complete)
>>
> I would not have 2 maven modules for asn1. It's probably overkilling. I
> would rather name the ldap-model ldap-api, because this is exactly what it
> is.
>

There are reasons for this to be able to get the codec to be separable. Once
you get in and play with the little non-important details you'll probably
come to the same conclusion yourself.

Let me propose a methodology we can follow here to speed things up without
needlessly arguing each point because in the end we do in fact come to the
same conclusions.

Let's just relax while decoupling about the number or name of modules. The
first pass should be about breaking up dependencies to hide the
implementation details so we're free to solve these aggressively without
inhibitions.

Then once we see a clear dependency between modules, we can take another
pass at consolidation as a separate concern and discussion. Until we see the
real dependency picture fall out from refactoring it's moot over discussing
it.


> Otherwise, I like this decomposition.
>
> There are a few more things we will have to discuss about :
> - ldif (part of ldap-model/ldap-api)
> - aci (but it may be in a separate module, a ADS specific one, as it's only
> good for ADS
> - trigger (same as above)
> - csn (maybe part of shared-util)
> - dsml (a separate module ?)
> - client api (connection, futures, exceptions) (part of
> ldap-model/ldap-api)
> - i18n (separate module would be good)
> - the schema loader probably deserves a separate ADS module too
> - the schema converter too
>
>  The next step would be to make these artifacts into OSGi bundles. There
>> will
>> be nothing special about it. I'm just going to leverage bundle packaging
>> to
>> hide implementation classes which you cannot do as easily with regular
>> jars
>> with explicit package exports.
>>
> That should be a no brainer.
>
>  Once this is done, we can export a minimal set of classes from the codec,
>> hide it's remainder, and have the model interfaces be the primary
>> dependency
>> used by the client API without exposing implementation classes and keeping
>> the API weight (surface area) down.
>>
>> There's a lot more to do, the job is 40% complete. The wait for the AP
>> merge
>> makes this work feel moot since the merge is going to be nasty so I might
>> just redo this again after Emmanuel merges. That lets me be a bit more
>> agressive and experimental for now.
>>
> go for it. As soon as you have something stable, as it's all about moving
> pieces, we can do that bit by bit, instead of merging.
>
>  Plus if Pierre and Seelman decide to opt for using m2eclipse+Maven+Tyco
>> (as
>> Jesse mentioned) for the Studio build then these refactorings a second
>> time
>> will not incur manual fixing in Studio which depends on shared now. I can
>> refactor Studio at the same time.
>>
> The real issue here is m2eclipse : it's everything but usable for a project
> as big as ADS. I have tried it again one month ago, and it smell like Maven
> 1 to me...
>
>
>
>> Conclusions
>> -----------------
>>
>> So this example shows some things we can do to make things tighter and
>> easier for us to better manage our API's. We can do anything we like to
>> the
>> implementation to fix bugs and to improve performance in point releases
>> without impacting the minimal interfaces we expose for the API.
>>
>
> And it can be a good opportunity to clean up the shared module which has
> become a giant plate of spaghetti (with bolognese sauce).
>
>  We take similar steps inside the server to restrict down the exposed SPI
>> however using OSGi is probably not going to be an option there right away
>> since it gets more complicated. Here in shared I would use bundle
>> packaging
>> just to hide implementation classes, not to define services etc.
>> Also there are some classes that were proposed for shared, i.e. DnNode
>> which
>> at this point in time are specific to the server. Sure Studio might use
>> these classes eventually, however these classes are not generic LDAP.
>> These
>> classes can stay in shared but they should be kept in a module separate
>> from
>> the ldap-model for example.
>>
> Agreed. There may be other classes to, they have to be identified.
>
>  Why you may ask? Because these classes are not
>> generic LDAP classes (like Entry, or Dn, or Cursor is generic and) are not
>> needed by every client, nor are they viable for every server a client
>> connects to. They only serve a purpose when used in Studio, connecting to
>> ApacheDS.
>>
> They are helper classes. They certainly don't belongs to
> ldap-model/ldap-api, and if they have to stay in shared, I would like to
> move it to utils.
>
>
We discussed this last night. Just wanted to point out the clarification you
made to me.

Shared will have shared-utils for generic utility classes that can be used
by anything not just LDAP code. Then there may be a shared-ldap-util but I
think this might be overkill and not such a good idea: let me explain
technically why:

If we dump these ldap specific utility classes into an ldap-util, then a
dependency to one util class pulls in utility classes in the rest of the
module increasing footprint perhaps needlessly.

What is needed for minimal generic client operation should be kept together
with as little dependencies as possible. No need to expose the plethora of
utility classes we have amassed in there.

Yes they are very useful utilities but it's not about packaging freebees
it's about keeping things small, tight, and minimizing exposure. We can
package these things into a separate jar and only use them in studio and in
apacheds.

So in conclusion what I am saying is the general formula we have become
accustomed to where we throw all utility classes into one module no longer
works blindly for us. We need to think about what needless dependencies and
classes this is including.


>  DnNode might be needed by Studio in the future for making a plugin and
>> widget that allows users to graphically manage the boundaries of
>> administrative areas, however it's not something every client needs, and
>> it
>> certainly is not something needed by a generic client connecting to every
>> server.
>>
> --> utils.
>
>
Again let's be more specific but let's not overly force ourselves right away
- perhaps there's a better name for this trapped in our heads. Give it time
to trickle out. As we decouple and break things appart this will become much
easier to see clearly. Just a general MO suggestion.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [DISCUSSION] General API & SPI Concerns

Posted by Alex Karasulu <ak...@apache.org>.
On Thu, Jan 6, 2011 at 4:49 AM, Emmanuel Lecharny <el...@gmail.com>wrote:

> On 1/6/11 2:36 AM, Alex Karasulu wrote:
>
>> Hi all,
>>
>> Excuse the cross post but this also has significance to the API list.
>>
>> Problem
>> ------------
>>
>> For our benefit and the benefit of our users we need to be uber careful
>> with
>> changes after a major GA release. We have another thread where it seems
>> people agree with the Eclipse scheme of versioning and this sounds really
>> flexible for our needs. We can do a 2.0.0-M1 release at any time without
>> clamping down on API's. Only when we do a RC do we have to freeze changes
>> to
>> interfaces.
>>
>> The debate still remains as to what constitues an interface. Emmanuel
>> seems
>> to disagree with configuration, schema, and partition db formats as being
>> interfaces of concern but for the time being we can just discuss those we
>> do
>> agree on. There's no doubt about APIs and SPIs.
>>
>
> I don't disagree with Schema, but Schema are clearly defined by RFCs, there
> is no possible interpretation about their syntax and definition.


Absolutely I do agree with you. I was thinking there's bug or mistake we
find with our core published schema or we change the ApacheDS meta schema.
In this case we'd need to bump up to a major version because then just a
software update will not solve problems with already created entries on disk
using the older schema. The situation will be undefined - very hard to
predict.


> However, the schema manipulation API is in the scope of this discussion.
>
>
This is part of the LDAP API and is as critical as Dn, or Entry since it's
tied together.


> Partition and configuration are not part of the Ldap API, thus are
> irrelevant in this discussion about shared refactoring.
>
>
Right this has nothing to do with shared APIs but is relavent to the server.
The same policies in API maintenance in shared will have to be applied to
the server.

To me anything exposed is something to consider for backwards compatibility,
not talking just API here. Whether it's LDAP extended operations, web
services, or database formats, these things impact backwards compatibility.


>
>  Solution
>> ------------
>>
>> So how do we make this as painless to us and users as much as is possible?
>> The best way is to keep the surface area of the SPI or API small, create
>> solid boundaries, and avoid exposing implementation details and
>> implementation classes.
>>
>> By reducing the surface area with implementation hiding we can effectively
>> limit exposure and reduce the probability of needing to make a change that
>> breaks with our user contract. You might be asking what's a real world
>> example of this for us in shared?
>>
>> And incidentally this is one of the things I've been working on in my
>> branch.
>>
>>
>> Real World Example in Shared
>> --------------------------------------------
>>
>> Let's take the o.a.d.s.ldap.message package as an example. This package
>> contains classes and interfaces modeling LDAP requests and responses: i.e.
>> AddRequest, DeleteResponse etc. It's in the shared-ldap module.
>>
>> In this package, in addition to request response interfaces, we're
>> exposing
>> implementation classes for them. The implementation classes, in turn have
>> dependencies on o.a.d.s.ldap.codec.* packages.
>>
> Not any more, I hope. We did a big refactoring last september in order to
> remove this coupling. Of course, we may have some remaining dependencies,
> but this is more or less not intentional.
>
>
Right not intentional and this is just one example in many. Look we've all
used shared as a dumping ground. While our primary focus was a tough problem
in Studio or in ApacheDS we put minimal energy into shared as we deposited
some classes and interfaces into it. This is because the main focus was
something else.

This is not to blame anyone. I am pointing out the problem, and pointing out
a solution to it so we're not screwed by it. The web of dependencies in
shared will f**k us down the line if we don't nix em now.


>
>   This is because some
>> implementation classes depend on codec functionality which is an
>> implementation detail.
>>
>
> Not true anymore (or is it?).
>
>
Yeah there's some residual dependencies but not a big deal to fix. Trivial
stuff.

The work needed here is a joke really. The big issue with it is the impact
the changes in shared have all over the place in Studio and ApacheDS and the
fact that we're better off waiting for AP work to complete to merge.


>  This might be due to eager reuse or the addition of
>> utility methods into codec classes for convenience. Some of these
>> dependencies can be removed by breaking out non-implementation specific
>> methods and constants in codec classes into utility methods outside of the
>> package or the module all together. Furthermore the codec implementation
>> that handles [de]marshaling has to access package friendly (non-API)
>> methods
>> on implementation classes while encoding.
>>
> Not sure that I get what you mean here. Can you be a bit more explicit ?


LdapEncoder accesses package friendly methods inside most message Impl
clases to encode them. This also pulls into message dependencies from codec
which can be hidden. But these are really easy to fix. We just need to know
that the situation is there and get rid of it.


>
>  In the end, dependency upon further transitive dependencies are making us
>> expose almost all implementation classes in shared, and most can easily be
>> decoupled and hidden. It's effectively making everything in shared come
>> together in one big heap exposing way more than we want to.
>>
> It's quite impossible in Java to 'hide' all the classes that a user should
> not manipulate. Unless you use package protected classes, and it quickly has
> a limit, I would rather think in term of 'exposed' (ie documented) API.


OSGi bundles really helps in this respect. It fills in where Java left off.

OSGi makes it so the (bundle) packaging coincides with module boundaries. In
Java this is loose and there's leakage all over, as you say, it's very hard
to hide all implementation classes.


> That this documented API is gathered in one separate module for convenience
> is another aspect, but the user will still have to depend on all the other
> modules.
>
>
Certainly, you're right, dependencies will still exist. A codec will be
depended upon for it's functionality even if we do hide the implementation
details under the hood.

The value add here is not from avoiding a dependency. It's from not exposing
more than we have to and being able to hide the implementation. This way we
can change the implementation at will across point releases without having
to bump up to a major revision.


> So all in all, should we define a module (a maven module) containing the
> public API and the associated implementation ? Probably (But this is not an
> absolute necessity). I guess this is what you have in mind, so let's see
> what's the proposal is...
>
>
We have multiple options for chopping this up. With bundles we have a nice
tool to carve out physical not just logical boundaries to our API's and only
expose those packages we need to show API users.


>
>> LDAP Client API
>> ------------------------
>>
>> Everyone agrees that this API is very important to get right with a 1.0.
>> Right now this API pulls in several public interfaces directly from
>> shared.
>> Those interfaces also pull in some implementation classes. The logical API
>> extends into shared this way. Effectively the majority of shared is
>> exposed
>> by the client API. The client API does not end at it's jar boundary.
>>
>> All this exposure increases the chances of API change when all
>> implementation details are wide open and part of the client API.  And this
>> is what I'm trying to limit. There are ways we can decouple these
>> dependencies very nicely with a mixed bag of refactoring techniques while
>> breaking up shared-ldap into lesser more coherent modules. The idea is to
>> expose the bare minimum of only what we need to expose. Yes the shared
>> code
>> has become very stable over time but the most stability is in the
>> interfaces
>> and if we only expose these instead of implementation classes then we'll
>> have an awesome API that may remain 1.X for a while and not require
>> deprecations as new functionality is introduced.
>>
>
> How will you limit the visibility of the modules you don't want the user to
> be exposed to ?
>
>
A combination of refactoring techniques will be used to be able to better
use standard Java protection mechanisms to hide implementation details
combined with using OSGi bundles instead of Jars to only export those
packages that we do want users to see.


>  Finishing Up the Example
>> -------------------------------------
>>
>> So what concrete things can we do?
>>
>> The biggest step is to hide as many of the implementation classes as
>> possible. In my experimental branch I started by:
>>
>>     (1) Moving out methods and constants in codec classes causing
>> unnecessary dependencies from message package classes and interfaces.
>> There
>> was a situation even where StringTools for example depended on codec
>> classes, and virtually everything doing string related operations used
>> StringTools there by causing man interdependencies. It then becomes a web
>> of
>> dependencies across packages.
>>
> There is *one* method in StringTools that calls a codec method :
> Hex.encodeHex. It's a mistake, as we already have another StringTools method
> (toHexString) doing the same thing (to be double chekced). This is typically
> a wrong usage of a class from a wrong package, and we should get rid of such
> coupling.
>

Yep, no big deal something that gets fixed in seconds but we have a few of
these kinds of examples. One by one they're nothing but all together they
create this web making almost everything dependent transitively on each
other. This is easy to fix.


>
> This is extremely painful to do such a cleanup without first decoupling all
> the pieces by creating separate jars, before regrouping the packages back
> again.
>
>
Why bother regrouping? We can regroup things for convenience if people want
a single jar without deps like the shard-all thingy.

We should not be uncomfortable having multiple modules to better decouple
this big hunk of code, and isolate coherent pieces as units.


> The question here is more to know how far we want to go, considering that
> shared contains 900 classes, more than 5600 methods and around 80 packages.
>
>
Yep it's big but the problem here is not massive. It starts slowly solving a
couple things and once you decouple a few things, decoupling others becomes
much much easier and a layout to all of it starts falling out nicely, which
shows even if we dumped here and created some cleanup issues for ourselves
the overall code really was written well.


>      (2) Breaking up shared into multiple Maven modules so now there's the
>> following modules:
>>
>>           o shared-util
>>           o asn1-api
>>           o asn1-ber
>>           o ldap-model
>>                  - name pkg
>>                  - message pkg (no impl classes)
>>                  - schema pkg
>>                  - cursor pkg
>>                  - filter pkg
>>                  - entry pkg
>>                  - constants pkg
>>           o ldap-codec (not complete)
>>
> I would not have 2 maven modules for asn1. It's probably overkilling. I
> would rather name the ldap-model ldap-api, because this is exactly what it
> is.
>

There are reasons for this to be able to get the codec to be separable. Once
you get in and play with the little non-important details you'll probably
come to the same conclusion yourself.

Let me propose a methodology we can follow here to speed things up without
needlessly arguing each point because in the end we do in fact come to the
same conclusions.

Let's just relax while decoupling about the number or name of modules. The
first pass should be about breaking up dependencies to hide the
implementation details so we're free to solve these aggressively without
inhibitions.

Then once we see a clear dependency between modules, we can take another
pass at consolidation as a separate concern and discussion. Until we see the
real dependency picture fall out from refactoring it's moot over discussing
it.


> Otherwise, I like this decomposition.
>
> There are a few more things we will have to discuss about :
> - ldif (part of ldap-model/ldap-api)
> - aci (but it may be in a separate module, a ADS specific one, as it's only
> good for ADS
> - trigger (same as above)
> - csn (maybe part of shared-util)
> - dsml (a separate module ?)
> - client api (connection, futures, exceptions) (part of
> ldap-model/ldap-api)
> - i18n (separate module would be good)
> - the schema loader probably deserves a separate ADS module too
> - the schema converter too
>
>  The next step would be to make these artifacts into OSGi bundles. There
>> will
>> be nothing special about it. I'm just going to leverage bundle packaging
>> to
>> hide implementation classes which you cannot do as easily with regular
>> jars
>> with explicit package exports.
>>
> That should be a no brainer.
>
>  Once this is done, we can export a minimal set of classes from the codec,
>> hide it's remainder, and have the model interfaces be the primary
>> dependency
>> used by the client API without exposing implementation classes and keeping
>> the API weight (surface area) down.
>>
>> There's a lot more to do, the job is 40% complete. The wait for the AP
>> merge
>> makes this work feel moot since the merge is going to be nasty so I might
>> just redo this again after Emmanuel merges. That lets me be a bit more
>> agressive and experimental for now.
>>
> go for it. As soon as you have something stable, as it's all about moving
> pieces, we can do that bit by bit, instead of merging.
>
>  Plus if Pierre and Seelman decide to opt for using m2eclipse+Maven+Tyco
>> (as
>> Jesse mentioned) for the Studio build then these refactorings a second
>> time
>> will not incur manual fixing in Studio which depends on shared now. I can
>> refactor Studio at the same time.
>>
> The real issue here is m2eclipse : it's everything but usable for a project
> as big as ADS. I have tried it again one month ago, and it smell like Maven
> 1 to me...
>
>
>
>> Conclusions
>> -----------------
>>
>> So this example shows some things we can do to make things tighter and
>> easier for us to better manage our API's. We can do anything we like to
>> the
>> implementation to fix bugs and to improve performance in point releases
>> without impacting the minimal interfaces we expose for the API.
>>
>
> And it can be a good opportunity to clean up the shared module which has
> become a giant plate of spaghetti (with bolognese sauce).
>
>  We take similar steps inside the server to restrict down the exposed SPI
>> however using OSGi is probably not going to be an option there right away
>> since it gets more complicated. Here in shared I would use bundle
>> packaging
>> just to hide implementation classes, not to define services etc.
>> Also there are some classes that were proposed for shared, i.e. DnNode
>> which
>> at this point in time are specific to the server. Sure Studio might use
>> these classes eventually, however these classes are not generic LDAP.
>> These
>> classes can stay in shared but they should be kept in a module separate
>> from
>> the ldap-model for example.
>>
> Agreed. There may be other classes to, they have to be identified.
>
>  Why you may ask? Because these classes are not
>> generic LDAP classes (like Entry, or Dn, or Cursor is generic and) are not
>> needed by every client, nor are they viable for every server a client
>> connects to. They only serve a purpose when used in Studio, connecting to
>> ApacheDS.
>>
> They are helper classes. They certainly don't belongs to
> ldap-model/ldap-api, and if they have to stay in shared, I would like to
> move it to utils.
>
>
We discussed this last night. Just wanted to point out the clarification you
made to me.

Shared will have shared-utils for generic utility classes that can be used
by anything not just LDAP code. Then there may be a shared-ldap-util but I
think this might be overkill and not such a good idea: let me explain
technically why:

If we dump these ldap specific utility classes into an ldap-util, then a
dependency to one util class pulls in utility classes in the rest of the
module increasing footprint perhaps needlessly.

What is needed for minimal generic client operation should be kept together
with as little dependencies as possible. No need to expose the plethora of
utility classes we have amassed in there.

Yes they are very useful utilities but it's not about packaging freebees
it's about keeping things small, tight, and minimizing exposure. We can
package these things into a separate jar and only use them in studio and in
apacheds.

So in conclusion what I am saying is the general formula we have become
accustomed to where we throw all utility classes into one module no longer
works blindly for us. We need to think about what needless dependencies and
classes this is including.


>  DnNode might be needed by Studio in the future for making a plugin and
>> widget that allows users to graphically manage the boundaries of
>> administrative areas, however it's not something every client needs, and
>> it
>> certainly is not something needed by a generic client connecting to every
>> server.
>>
> --> utils.
>
>
Again let's be more specific but let's not overly force ourselves right away
- perhaps there's a better name for this trapped in our heads. Give it time
to trickle out. As we decouple and break things appart this will become much
easier to see clearly. Just a general MO suggestion.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [DISCUSSION] General API & SPI Concerns

Posted by Emmanuel Lecharny <el...@gmail.com>.
On 1/6/11 2:36 AM, Alex Karasulu wrote:
> Hi all,
>
> Excuse the cross post but this also has significance to the API list.
>
> Problem
> ------------
>
> For our benefit and the benefit of our users we need to be uber careful with
> changes after a major GA release. We have another thread where it seems
> people agree with the Eclipse scheme of versioning and this sounds really
> flexible for our needs. We can do a 2.0.0-M1 release at any time without
> clamping down on API's. Only when we do a RC do we have to freeze changes to
> interfaces.
>
> The debate still remains as to what constitues an interface. Emmanuel seems
> to disagree with configuration, schema, and partition db formats as being
> interfaces of concern but for the time being we can just discuss those we do
> agree on. There's no doubt about APIs and SPIs.

I don't disagree with Schema, but Schema are clearly defined by RFCs, 
there is no possible interpretation about their syntax and definition. 
However, the schema manipulation API is in the scope of this discussion.

Partition and configuration are not part of the Ldap API, thus are 
irrelevant in this discussion about shared refactoring.

> Solution
> ------------
>
> So how do we make this as painless to us and users as much as is possible?
> The best way is to keep the surface area of the SPI or API small, create
> solid boundaries, and avoid exposing implementation details and
> implementation classes.
>
> By reducing the surface area with implementation hiding we can effectively
> limit exposure and reduce the probability of needing to make a change that
> breaks with our user contract. You might be asking what's a real world
> example of this for us in shared?
>
> And incidentally this is one of the things I've been working on in my
> branch.
>
>
> Real World Example in Shared
> --------------------------------------------
>
> Let's take the o.a.d.s.ldap.message package as an example. This package
> contains classes and interfaces modeling LDAP requests and responses: i.e.
> AddRequest, DeleteResponse etc. It's in the shared-ldap module.
>
> In this package, in addition to request response interfaces, we're exposing
> implementation classes for them. The implementation classes, in turn have
> dependencies on o.a.d.s.ldap.codec.* packages.
Not any more, I hope. We did a big refactoring last september in order 
to remove this coupling. Of course, we may have some remaining 
dependencies, but this is more or less not intentional.

>   This is because some
> implementation classes depend on codec functionality which is an
> implementation detail.

Not true anymore (or is it?).
> This might be due to eager reuse or the addition of
> utility methods into codec classes for convenience. Some of these
> dependencies can be removed by breaking out non-implementation specific
> methods and constants in codec classes into utility methods outside of the
> package or the module all together. Furthermore the codec implementation
> that handles [de]marshaling has to access package friendly (non-API) methods
> on implementation classes while encoding.
Not sure that I get what you mean here. Can you be a bit more explicit ?
> In the end, dependency upon further transitive dependencies are making us
> expose almost all implementation classes in shared, and most can easily be
> decoupled and hidden. It's effectively making everything in shared come
> together in one big heap exposing way more than we want to.
It's quite impossible in Java to 'hide' all the classes that a user 
should not manipulate. Unless you use package protected classes, and it 
quickly has a limit, I would rather think in term of 'exposed' (ie 
documented) API. That this documented API is gathered in one separate 
module for convenience is another aspect, but the user will still have 
to depend on all the other modules.

So all in all, should we define a module (a maven module) containing the 
public API and the associated implementation ? Probably (But this is not 
an absolute necessity). I guess this is what you have in mind, so let's 
see what's the proposal is...
>
> LDAP Client API
> ------------------------
>
> Everyone agrees that this API is very important to get right with a 1.0.
> Right now this API pulls in several public interfaces directly from shared.
> Those interfaces also pull in some implementation classes. The logical API
> extends into shared this way. Effectively the majority of shared is exposed
> by the client API. The client API does not end at it's jar boundary.
>
> All this exposure increases the chances of API change when all
> implementation details are wide open and part of the client API.  And this
> is what I'm trying to limit. There are ways we can decouple these
> dependencies very nicely with a mixed bag of refactoring techniques while
> breaking up shared-ldap into lesser more coherent modules. The idea is to
> expose the bare minimum of only what we need to expose. Yes the shared code
> has become very stable over time but the most stability is in the interfaces
> and if we only expose these instead of implementation classes then we'll
> have an awesome API that may remain 1.X for a while and not require
> deprecations as new functionality is introduced.

How will you limit the visibility of the modules you don't want the user 
to be exposed to ?
> Finishing Up the Example
> -------------------------------------
>
> So what concrete things can we do?
>
> The biggest step is to hide as many of the implementation classes as
> possible. In my experimental branch I started by:
>
>      (1) Moving out methods and constants in codec classes causing
> unnecessary dependencies from message package classes and interfaces. There
> was a situation even where StringTools for example depended on codec
> classes, and virtually everything doing string related operations used
> StringTools there by causing man interdependencies. It then becomes a web of
> dependencies across packages.
There is *one* method in StringTools that calls a codec method : 
Hex.encodeHex. It's a mistake, as we already have another StringTools 
method (toHexString) doing the same thing (to be double chekced). This 
is typically a wrong usage of a class from a wrong package, and we 
should get rid of such coupling.

This is extremely painful to do such a cleanup without first decoupling 
all the pieces by creating separate jars, before regrouping the packages 
back again.

The question here is more to know how far we want to go, considering 
that shared contains 900 classes, more than 5600 methods and around 80 
packages.
>      (2) Breaking up shared into multiple Maven modules so now there's the
> following modules:
>
>            o shared-util
>            o asn1-api
>            o asn1-ber
>            o ldap-model
>                   - name pkg
>                   - message pkg (no impl classes)
>                   - schema pkg
>                   - cursor pkg
>                   - filter pkg
>                   - entry pkg
>                   - constants pkg
>            o ldap-codec (not complete)
I would not have 2 maven modules for asn1. It's probably overkilling. I 
would rather name the ldap-model ldap-api, because this is exactly what 
it is.
Otherwise, I like this decomposition.

There are a few more things we will have to discuss about :
- ldif (part of ldap-model/ldap-api)
- aci (but it may be in a separate module, a ADS specific one, as it's 
only good for ADS
- trigger (same as above)
- csn (maybe part of shared-util)
- dsml (a separate module ?)
- client api (connection, futures, exceptions) (part of ldap-model/ldap-api)
- i18n (separate module would be good)
- the schema loader probably deserves a separate ADS module too
- the schema converter too
> The next step would be to make these artifacts into OSGi bundles. There will
> be nothing special about it. I'm just going to leverage bundle packaging to
> hide implementation classes which you cannot do as easily with regular jars
> with explicit package exports.
That should be a no brainer.
> Once this is done, we can export a minimal set of classes from the codec,
> hide it's remainder, and have the model interfaces be the primary dependency
> used by the client API without exposing implementation classes and keeping
> the API weight (surface area) down.
>
> There's a lot more to do, the job is 40% complete. The wait for the AP merge
> makes this work feel moot since the merge is going to be nasty so I might
> just redo this again after Emmanuel merges. That lets me be a bit more
> agressive and experimental for now.
go for it. As soon as you have something stable, as it's all about 
moving pieces, we can do that bit by bit, instead of merging.
> Plus if Pierre and Seelman decide to opt for using m2eclipse+Maven+Tyco (as
> Jesse mentioned) for the Studio build then these refactorings a second time
> will not incur manual fixing in Studio which depends on shared now. I can
> refactor Studio at the same time.
The real issue here is m2eclipse : it's everything but usable for a 
project as big as ADS. I have tried it again one month ago, and it smell 
like Maven 1 to me...

>
> Conclusions
> -----------------
>
> So this example shows some things we can do to make things tighter and
> easier for us to better manage our API's. We can do anything we like to the
> implementation to fix bugs and to improve performance in point releases
> without impacting the minimal interfaces we expose for the API.

And it can be a good opportunity to clean up the shared module which has 
become a giant plate of spaghetti (with bolognese sauce).
> We take similar steps inside the server to restrict down the exposed SPI
> however using OSGi is probably not going to be an option there right away
> since it gets more complicated. Here in shared I would use bundle packaging
> just to hide implementation classes, not to define services etc.
> Also there are some classes that were proposed for shared, i.e. DnNode which
> at this point in time are specific to the server. Sure Studio might use
> these classes eventually, however these classes are not generic LDAP. These
> classes can stay in shared but they should be kept in a module separate from
> the ldap-model for example.
Agreed. There may be other classes to, they have to be identified.
> Why you may ask? Because these classes are not
> generic LDAP classes (like Entry, or Dn, or Cursor is generic and) are not
> needed by every client, nor are they viable for every server a client
> connects to. They only serve a purpose when used in Studio, connecting to
> ApacheDS.
They are helper classes. They certainly don't belongs to 
ldap-model/ldap-api, and if they have to stay in shared, I would like to 
move it to utils.
> DnNode might be needed by Studio in the future for making a plugin and
> widget that allows users to graphically manage the boundaries of
> administrative areas, however it's not something every client needs, and it
> certainly is not something needed by a generic client connecting to every
> server.
--> utils.
> So things like this as well as the category of interfaces and classes used
> for modeling ApacheDS specific features which also are used by Studio should
> be in their own modules, if kept in shared, separate from the model or the
> codec bundles. This way they can remain in shared, used by both Studio and
> ApacheDS without polluting the client API.  As an example, the ACI mechanism
> we use is very ApacheDS specific and is used by Studio's ACI editor. I
> wanted to say X.500 specific, but we've changed our ACIs a tiny bit. So we
> might have an ldap-aci module that pulls these things out of the ldap-model
> so our standard client API remains clean and light, free of our ApacheDS
> specific features.
+1. See upper.
> The power behind this API is the number of people and projects that will use
> it. We don't want the OpenDS folks for example to avoid it just because they
> don't want our ApacheDS specific interfaces weighing it down and
> contaminating it. I'd love to see the API used with a light footprint on
> mobile devices, so footprint will matter in this odd ball case as well.
>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com