You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Alex Karasulu <ak...@apache.org> on 2008/04/02 18:04:07 UTC

[Studio] Using JDBM for secondary cache

Stefan S.,

Do we use a secondary cache for Studio? Just wondering because of the
performance issues someone noted on the user list when dealing with a very
large directory.  The idea occurred to me that the JDBM code for JdbmTable
and JdbmIndex could potentially be used by Studio to help solve some of the
caching problems.  If this is something you think will help we can move this
code into shared so both the server and Studio can leverage them.

Alex

Re: [Studio] Using JDBM for secondary cache

Posted by Emmanuel Lecharny <el...@gmail.com>.

Stefan Seelmann wrote:
> Hi Alex,
>
>
> Alex Karasulu schrieb:
>> Stefan S.,
>>
>> Do we use a secondary cache for Studio? Just wondering because of the 
>> performance issues someone noted on the user list when dealing with a 
>> very large directory.  
>
> no, we don't use a cache for Studio. I tried to use ehcache long time 
> ago but it was really slow when it swaps entries to the disk and back.
Whatever cache you use, as soon as you hit the disk, performances will 
be awfull. The question is : do we accept those terrible performance, or 
should we accept an OOM ? Now, with JDBM, you have a middle term 
situation. If you use an appropriate size for the JDBM cache, it will 
only contain the tree, not the entries, leading to a great perf 
improvement. You will still have to hit the disk to get the data, but 
you will avoid a lit of disk access.

Of course, it depends on the kind of operation you want to run. We 
discussed a lot with Pierre-Arnaud about the idea to integrate ADS 
_into_ studio, gaining a direct benefit of being able to play faked 
operations locally, and also to use its internal cache.

To be discussed, of course...
>
> Today there we have the following:
> - a HashMap<String, IEntry>: with the DN as key and the entry as value 
> (IEntry is a Studio internal interface with some implementations)
> - a HashMap<IEntry, AttributeInfo>: as soon as the attributes of an 
> entry were loaded an AttributeInfo containing all attributes and other 
> information is created and put to this map
> - a hashMap<IEntry, ChildrenInfo>: as soon as child entries are loaded 
> a ChildrenInfo containing all child entries is created and put to this 
> map.
>
> For sure, this does scale well. With the default VM parameters (64MB 
> heap) you could load about 30,000 entries to get an OutOfMemory :-(((

What about WeakHashap ?
>
> I also think that the switch from the old DN/RDN implementation to the 
> shared-ldap LdapDN/Rdn implementation costs some memory. We should 
> consider to do some test for performance and memeory consumtion.
Performance is ok, but we may eat more memory with LdapDN, as we keep 
two forms of the DN (user provided and normalized). In your case, I 
would say it's unoticable, when compared with the other attributes 
(especially big binary values).

If we were to dig, I would not pick this place...
>
>
>> The idea occurred to me that the JDBM code for JdbmTable and 
>> JdbmIndex could potentially be used by Studio to help solve some of 
>> the caching problems.  If this is something you think will help we 
>> can move this code into shared so both the server and Studio can 
>> leverage them.
>
> Yeah, that sounds great. You want me to look how the JDBM code works? 
> I guess we will have some time at ApacheCon for that.
Yes, sure ! This is what AC is for !


-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [Studio] Using JDBM for secondary cache

Posted by Emmanuel Lecharny <el...@gmail.com>.

Stefan Seelmann wrote:
> Emmanuel Lecharny schrieb:
>>
>> * On the client, we don't really care to stroe the UP form. What we 
>> need is just to _validate_ the DN, keeping the initial String. Each 
>> time we need to parse the DN, we can do it. We can also implement the 
>> fail-fast parser (ASCII DN parser)
>
> Veto :-)

My bad, I wrote the opposite to what I had in mind : "On the client, we 
don't really care to store the _normalized_ form ..."



-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [Studio] Using JDBM for secondary cache

Posted by Stefan Seelmann <se...@apache.org>.

Emmanuel Lecharny schrieb:
> 
> * On the client, we don't really care to stroe the UP form. What we need 
> is just to _validate_ the DN, keeping the initial String. Each time we 
> need to parse the DN, we can do it. We can also implement the fail-fast 
> parser (ASCII DN parser)

Veto :-)

The DN that comes from the server (via JNDI) is parsed to an LdapDN, we 
use the UP form when presenting the DN to the user as the user should 
see the DN with all upper and lower cased characters and spaces and so on.

In the cache of DN(String)->Entry (atm just a hashmap) we use a special 
form of a normalized DN: we replace the attribute type by its OID. That 
is necessary for the "Locate in DIT" function. Whenever you have an DN 
under your cursor you could find it in the DIT, whatever form the DN has.

We also use the DN/RDN/ATAV structures to compose an DN, e.g. when 
creating or renaming an entry.

> * On the server, that's another story. We absolutly need to deal with 
> the DN/RDN/AttributeTypeAndValue structures, as we have a lot of 
> controls on them. But the ASCII only parser can be used.
> 
> I guess we can quickly write the ASCII parser, and give it a try. If we 
> gain a lot, then we have to switch to it in the server.
> 
> 

Kind Regards,
Stefan

Re: [Studio] Using JDBM for secondary cache

Posted by Alex Karasulu <ak...@apache.org>.

On Wed, Apr 2, 2008 at 5:55 PM, Emmanuel Lecharny <el...@gmail.com>
wrote:

>
>
> >    I also think that the switch from the old DN/RDN implementation to
> >    the shared-ldap LdapDN/Rdn implementation costs some memory. We
> >    should consider to do some test for performance and memeory
> >    consumtion.
> >
> >
> > Oh that's not good.  We need to cleanup that code anyway so might be
> > good to work in some optimization.
> > Emmanuel had a good idea at some point to build a simple parser for DNs
> > along with a simpler LdapDN class for handling most general cases.  If this
> > parser fails then another corner-case parser continues where the first left
> > off.
> >
> The idea was to consider that a DN is using only ASCII chars. If not,
> throw an exception, and use the standard parser. This can save some cycles,
> that's for sure.
>

That was one of the ideas.  The other was to keep things that complicate the
DN parsing or require additional data structures in the LdapDN out to reduce
additional processing, complexity and footprint.


>
>
> > All these crazy and complicated corner cases like with multi-attribute
> > Rdns and character issues cost more memory.
> >
> No. In no case. It costs time, not memory. We don't store anything but
> Strings.
>

You have some extra storage and references for handling multi-attribute
Rdns.  This costs memory too.


>
>  They can then be handled by this special DN parser with it's resective
> > special LdapDN object that has additional structures for handling these the
> > tracking of these complex DNs.
> >
> > If 99% of the time the simple LDAP DNs are used with smaller footprint,
> > then we can reduce complexity and memory usage, while increasing
> > performance.  This will have an impact for both ApacheDS and Studio.
> >
> Depending on which side we are handling DNs, we may implement different
> optimizations.
>
> * On the client, we don't really care to stroe the UP form.


Right.  The updn and dn need to be separate objects rather than mixing these
concerns.  It's a mess in there right now when mixing all this together.

Alex

Re: [Studio] Using JDBM for secondary cache

Posted by Emmanuel Lecharny <el...@gmail.com>.

>
>     I also think that the switch from the old DN/RDN implementation to
>     the shared-ldap LdapDN/Rdn implementation costs some memory. We
>     should consider to do some test for performance and memeory
>     consumtion.
>
>
> Oh that's not good.  We need to cleanup that code anyway so might be 
> good to work in some optimization. 
>
> Emmanuel had a good idea at some point to build a simple parser for 
> DNs along with a simpler LdapDN class for handling most general 
> cases.  If this parser fails then another corner-case parser continues 
> where the first left off.
The idea was to consider that a DN is using only ASCII chars. If not, 
throw an exception, and use the standard parser. This can save some 
cycles, that's for sure.
>
> All these crazy and complicated corner cases like with multi-attribute 
> Rdns and character issues cost more memory. 
No. In no case. It costs time, not memory. We don't store anything but 
Strings.
> They can then be handled by this special DN parser with it's resective 
> special LdapDN object that has additional structures for handling 
> these the tracking of these complex DNs.
>
> If 99% of the time the simple LDAP DNs are used with smaller 
> footprint, then we can reduce complexity and memory usage, while 
> increasing performance.  This will have an impact for both ApacheDS 
> and Studio.
Depending on which side we are handling DNs, we may implement different 
optimizations.

* On the client, we don't really care to stroe the UP form. What we need 
is just to _validate_ the DN, keeping the initial String. Each time we 
need to parse the DN, we can do it. We can also implement the fail-fast 
parser (ASCII DN parser)
* On the server, that's another story. We absolutly need to deal with 
the DN/RDN/AttributeTypeAndValue structures, as we have a lot of 
controls on them. But the ASCII only parser can be used.

I guess we can quickly write the ASCII parser, and give it a try. If we 
gain a lot, then we have to switch to it in the server.


-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [Studio] Using JDBM for secondary cache

Posted by Alex Karasulu <ak...@apache.org>.

Hi Stefan,

On Wed, Apr 2, 2008 at 3:27 PM, Stefan Seelmann <se...@apache.org> wrote:

> Hi Alex,
>
>
> Alex Karasulu schrieb:
>
> > Stefan S.,
> >
> > Do we use a secondary cache for Studio? Just wondering because of the
> > performance issues someone noted on the user list when dealing with a very
> > large directory.
> >
>
> no, we don't use a cache for Studio. I tried to use ehcache long time ago
> but it was really slow when it swaps entries to the disk and back.
>
> Today there we have the following:
> - a HashMap<String, IEntry>: with the DN as key and the entry as value
> (IEntry is a Studio internal interface with some implementations)
> - a HashMap<IEntry, AttributeInfo>: as soon as the attributes of an entry
> were loaded an AttributeInfo containing all attributes and other information
> is created and put to this map
> - a hashMap<IEntry, ChildrenInfo>: as soon as child entries are loaded a
> ChildrenInfo containing all child entries is created and put to this map.
>
> For sure, this does scale well. With the default VM parameters (64MB heap)
> you could load about 30,000 entries to get an OutOfMemory :-(((
>
> I also think that the switch from the old DN/RDN implementation to the
> shared-ldap LdapDN/Rdn implementation costs some memory. We should consider
> to do some test for performance and memeory consumtion.
>

Oh that's not good.  We need to cleanup that code anyway so might be good to
work in some optimization.

Emmanuel had a good idea at some point to build a simple parser for DNs
along with a simpler LdapDN class for handling most general cases.  If this
parser fails then another corner-case parser continues where the first left
off.

All these crazy and complicated corner cases like with multi-attribute Rdns
and character issues cost more memory.  They can then be handled by this
special DN parser with it's resective special LdapDN object that has
additional structures for handling these the tracking of these complex DNs.

If 99% of the time the simple LDAP DNs are used with smaller footprint, then
we can reduce complexity and memory usage, while increasing performance.
This will have an impact for both ApacheDS and Studio.

>
>
>  The idea occurred to me that the JDBM code for JdbmTable and JdbmIndex
> > could potentially be used by Studio to help solve some of the caching
> > problems.  If this is something you think will help we can move this code
> > into shared so both the server and Studio can leverage them.
> >
>
> Yeah, that sounds great. You want me to look how the JDBM code works? I
> guess we will have some time at ApacheCon for that.
>

I was just pointing it out if you were interested in using it.  This is just
a wrapper that abstracts most BTree implementations with a common
interface.  The JDBM implementation which we use by default might be handy
for a secondary cache to swap entries out.

We can talk about it if you want to use it at AC.  Also here's a link to the
documentation (which I am really proud of :D):

http://cwiki.apache.org/confluence/display/DIRxSRVx11/Index+and+IndexEntry

Regards,
Alex

Re: [Studio] Using JDBM for secondary cache

Posted by Stefan Seelmann <se...@apache.org>.

Hi Alex,


Alex Karasulu schrieb:
> Stefan S.,
> 
> Do we use a secondary cache for Studio? Just wondering because of the 
> performance issues someone noted on the user list when dealing with a 
> very large directory.  

no, we don't use a cache for Studio. I tried to use ehcache long time 
ago but it was really slow when it swaps entries to the disk and back.

Today there we have the following:
- a HashMap<String, IEntry>: with the DN as key and the entry as value 
(IEntry is a Studio internal interface with some implementations)
- a HashMap<IEntry, AttributeInfo>: as soon as the attributes of an 
entry were loaded an AttributeInfo containing all attributes and other 
information is created and put to this map
- a hashMap<IEntry, ChildrenInfo>: as soon as child entries are loaded a 
ChildrenInfo containing all child entries is created and put to this map.

For sure, this does scale well. With the default VM parameters (64MB 
heap) you could load about 30,000 entries to get an OutOfMemory :-(((

I also think that the switch from the old DN/RDN implementation to the 
shared-ldap LdapDN/Rdn implementation costs some memory. We should 
consider to do some test for performance and memeory consumtion.


> The idea occurred to me that the JDBM code for 
> JdbmTable and JdbmIndex could potentially be used by Studio to help 
> solve some of the caching problems.  If this is something you think will 
> help we can move this code into shared so both the server and Studio can 
> leverage them.

Yeah, that sounds great. You want me to look how the JDBM code works? I 
guess we will have some time at ApacheCon for that.

> 
> Alex

Kind Regards,
Stefan Seelmann