You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@directory.apache.org by Juergen Weber <we...@gmail.com> on 2007/02/07 21:50:59 UTC

JDBC/Derby backend

On 2/7/07, Alex Karasulu <ak...@apache.org> wrote:

>
> >> Has anybody tried a Derby backend for ds?
>
> BTW I inquired about using Derby with Debrunner a few years ago at an
> AC.  He basically stated that it would be a bad move since Derby would
> be challenged to deal with hierarchies.

Well, IBM's LDAP Server for z/OS does use DB2 as backend and it's very fast.

Of course you have to map hierarchical data structures to tables but
IBM showed that this is possible.

This http://www.openldap.org/faq/data/cache/378.html OpenLdap FAQ
entry more or less discourages from an SQL backend.
But in this same FAQ entry Kurt Zeilenga gives a link to an IBM paper
describing their database backend which quotes good performance (which
matches our user experience with the z/OS LDAP server).
http://www.research.ibm.com/journal/sj/392/shi.html

A JDBC backend for DS could combine fast Java network and data
structure handling with a fast Enterprise DBMS.

For embedded use Derby should serve well.

End of '05 I ran some transaction tests against Derby and got about 20
TX/sec on my PC which seemed very fast to me.
As transactions are not very important for LDAP servers, only read
performance would matter.
It would definitely be interesting to have numbers for Berkeley DB
(which of course is not relational).

Juergen

Re: JDBC/Derby backend

Posted by Stefan Zoerner <sz...@apache.org>.

Hi Juergen!

Juergen Weber wrote:
> Well, IBM's LDAP Server for z/OS does use DB2 as backend and it's very 
> fast.

The same is true for this product on other platforms (IBM Tivoli 
Directory Server is available for Solaris, Windows, ...). A major 
advantage is that the LDAP server inherits some nice DB2 properties 
(high availabilty features, distribution) especially on AIX.

> Of course you have to map hierarchical data structures to tables but
> IBM showed that this is possible.

ApacheDS architecture is quite flexible and allows you to extend the 
server with your own partition implementations.

We had several questions about a JDBC or Derby partition the last 
months. From my point of view it would be nice to have a JDBC partition 
(reasonable/performant or not), for instance as an example on how to 
write your own partition.

Unfortunately, our resources are limited (all volunteers). Currently 
ather things have a higher priority. But contributions are always 
appreciated. Perhaps you would like to share your experiences with 
ApacheDS on z/OS, as well.

Greetings from Hamburg,
     Stefan

---8<---

Stefan Zoerner (szoerner@apache.org)
Apache Directory Project
Committer :: PMC Member

Re: JDBC/Derby backend

Posted by Emmanuel Lecharny <el...@gmail.com>.

Juergen Weber a écrit :

>
> I think for a JDBC backend the actual database code should be trivial
> in comparision with the database schema and mapping of LDAP queries to
> SQL queries. Probably one would end with something quite similar to
> the IBM paper 

Mapping fitlters to SQL SELECTs might be tricky, for sure...

The structure of a RDBMS schema has been kind of overviewed a few months 
ago :
http://cwiki.apache.org/confluence/display/DIRxSRVx11/Backend

But this is very initial thought...

> (is that patented in the US?)

I have no idea ... I don't know if you can patent such trivial ideas ;)

>
> Or a quite trivial mapping:
>
> DNS: DN | EID
>
> and
>
> ENTRIES: EID | ATTNAME | ATTVAL
>
>
> So ldapsearch  -b "o=sevenSeas"   "(givenName=William)"
>
> would map to
>
> select * from DNS D, ENTRIES E where D.EID = E.EID and D.EID like
> 'o=sevenSeas%'
> and E.ATTNAME = 'givenName' and E.ATTVAL = 'William'

Remember that an attribute can have more than one value. Using a schema 
where entry has three column can be seen as a waste of space, but it can 
also be a waste of time has request will need to do 2 joints instead of one.

This is where we have a lot of things to analyze, test, evaluate. Just 
consider that we might have more than one index per column. We have to 
deal with six different kind of searches :
* present : an attribute is simply present, with a non null value)
* equal : the attribute value equals the assertion value, wrt matching 
rule (that means we _must_ implement matching rules into the database. 
Not so simple...
* approx : the attribute is an approximation of the assertion value 
(soundex algorithm). How does it work for non european languages like 
cantonese or hangul ?
* greater or equal, less or equal : we need to inject the comparators 
into the database.
* substrings : equivalent to the LIKE SQL instruction
* extensible match : I don't know anybody using them, but anyway, it's 
in the spec ...

All those guys are not easy to implement and are heavily relaying to the 
Ldap Schema content. Not exactly a piece of cake to implement :)

>
> One would have to see how efficient this join becomes for huge 
> directories ...

Well, not necessarily worst than doint in memory joins like we do with 
our B-trees :) At least, RDBMS are very good at that, not to mention the 
optimizations you can easily do with RDBMS help (Oracle has incredible 
tools to analyze a request)


So far, the subject is damn interesting, but not as simple as we can 
think at first glance.  But I do think it worth the effort, definively.

Emmanuel.

PS : as this is pretty much Dev matter, what about switching to the dev 
ML ? (dev@directory.apache.org)

Re: JDBC/Derby backend

Posted by Juergen Weber <we...@gmail.com>.

Of course this should be

select * from DNS D, ENTRIES E where D.EID = E.EID and D.DN like
'o=sevenSeas%'
and E.ATTNAME = 'givenName' and E.ATTVAL = 'William'

Re: JDBC/Derby backend

Posted by Juergen Weber <we...@gmail.com>.

On 2/8/07, Emmanuel Lecharny <el...@gmail.com> wrote:
>
> What is important with using a RDBMS as a backend is not only
> performance. You have much more than that :
> - public acceptance : Oracle, DB2, ... are almost everywhere. It would
> be bad to ignore this fact
> - fault tolerance/reliability : do I have to add anything?
> - transaction support. RDBMS support it natively
> - Knowlegde : so many DBA, so few JDBM programmers...
> etc...

- Database backup and recovery

Yes, I think too, that this points are more important than performance alone.


> > But in this same FAQ entry Kurt Zeilenga gives a link to an IBM paper
> > describing their database backend which quotes good performance (which
> > matches our user experience with the z/OS LDAP server).
> > http://www.research.ibm.com/journal/sj/392/shi.html
>
> They are both interesting papers, but both a little bit outdated.
> Anyway, the rational for using something different than a RDBMS is
> pretty clear.

I think for a JDBC backend the actual database code should be trivial
in comparision with the database schema and mapping of LDAP queries to
SQL queries. Probably one would end with something quite similar to
the IBM paper (is that patented in the US?)

Or a quite trivial mapping:

DNS: DN | EID

and

ENTRIES: EID | ATTNAME | ATTVAL


So ldapsearch  -b "o=sevenSeas"   "(givenName=William)"

would map to

select * from DNS D, ENTRIES E where D.EID = E.EID and D.EID like
'o=sevenSeas%'
and E.ATTNAME = 'givenName' and E.ATTVAL = 'William'

One would have to see how efficient this join becomes for huge directories ...


Juergen

Re: JDBC/Derby backend

Posted by Emmanuel Lecharny <el...@gmail.com>.

Juergen Weber a écrit :

> On 2/7/07, Alex Karasulu <ak...@apache.org> wrote:
>
>>
>> >> Has anybody tried a Derby backend for ds?
>>
>> BTW I inquired about using Derby with Debrunner a few years ago at an
>> AC.  He basically stated that it would be a bad move since Derby would
>> be challenged to deal with hierarchies.
>
>
> Well, IBM's LDAP Server for z/OS does use DB2 as backend and it's very 
> fast.

Sure. I already saw IBM IDS + DB2 managing more than 70 000 000 entries 
without a pb.

<note>
I don't want to enter into a religious war here...
</note>

But "very fast" means almost nothing, if you can't compare it to 
something else. How "Very fast" is faster than "fast" or "slow" ?

What is important with using a RDBMS as a backend is not only 
performance. You have much more than that :
- public acceptance : Oracle, DB2, ... are almost everywhere. It would 
be bad to ignore this fact
- fault tolerance/reliability : do I have to add anything?
- transaction support. RDBMS support it natively
- Knowlegde : so many DBA, so few JDBM programmers...
etc...

<note>
End of the potential religious war. You are returning back to the 
agnostic zone :)
</note>

>
> Of course you have to map hierarchical data structures to tables but
> IBM showed that this is possible.

It is definitively possible. No doubt.

>
> This http://www.openldap.org/faq/data/cache/378.html OpenLdap FAQ
> entry more or less discourages from an SQL backend.
> But in this same FAQ entry Kurt Zeilenga gives a link to an IBM paper
> describing their database backend which quotes good performance (which
> matches our user experience with the z/OS LDAP server).
> http://www.research.ibm.com/journal/sj/392/shi.html

They are both interesting papers, but both a little bit outdated. 
Anyway, the rational for using something different than a RDBMS is 
pretty clear.

>
> A JDBC backend for DS could combine fast Java network and data
> structure handling with a fast Enterprise DBMS.
>
> For embedded use Derby should serve well.
>
> End of '05 I ran some transaction tests against Derby and got about 20
> TX/sec on my PC which seemed very fast to me.

FYI, as of mid-june 2006, we did some tests and compared ADS and 
OpenLdap on a simple desktop (one CPU, 3Mhz, 1 Gb). Open ldap was able 
to do up to 1500 search per second, and ADS went up to 900 search req/s.

Does it mean the number you obtained is slow ? No. My bet is that they 
will be perfectly OK for 99% of all the Ldap Server installed around the 
world. And that you may also use a more powerfull computer to obtain 
better perfs.

> As transactions are not very important for LDAP servers, only read
> performance would matter.

Very true. This is why your 20 TX/sec are not really meaningfull for a 
Ldap server. Without transaction, I won't be surprised if we get much 
better numbers.

> It would definitely be interesting to have numbers for Berkeley DB
> (which of course is not relational).

Well, I think it's close to the numbers we have found.

But I want to stress a point : those numbers mean nothing at all by 
themselves. The question is not to have the fastest LdapServer on earth, 
but much more to have a powerfull server, which can evolve, easy to 
manage, easy to install, reliable, scalable, easy to support...

Using a RDBMS as a backend is a must in this kind of scenario (IMHO).

Now, you have also to ponder this with other points :
- workforce to implement it : we are all volunteers, working on ADS out 
of day job, during evening, night and week-end
- users needs : we don't really have time to test and experiment all the 
possible solutions. Then it's all about choice. We do choice, every day, 
everytime. I hope that those choices are not the worst we could do, but 
we may be wrong from time to time :)
- knowledge : sometime you favor a technology against another one 
because of limited knowledge. (you might feel so confortable with a 
piece of software that you prefer to use it well than having a limited 
use of another component you don't really know)

Here we are. We really want to use a RDBMS system for many reasons. We 
didn't found the time to explore this solution up to now. We need other 
volunteers to help us :)

Just join the effort ! We will really appreciate new blood and new 
vision : Apache community is just all about that !

Emmanuel

>
> Juergen
>
PS: We already use Derby inside Apache DS (in the 1.5 version) : The 
Multi-Master replication mechanism use Derby as a repository.

Re: JDBC/Derby backend

Posted by Alex Karasulu <ak...@apache.org>.

Juergen Weber wrote:
> On 2/7/07, Alex Karasulu <ak...@apache.org> wrote:
> 
>>
>> >> Has anybody tried a Derby backend for ds?
>>
>> BTW I inquired about using Derby with Debrunner a few years ago at an
>> AC.  He basically stated that it would be a bad move since Derby would
>> be challenged to deal with hierarchies.
> 
> Well, IBM's LDAP Server for z/OS does use DB2 as backend and it's very 
> fast.

Great! Are you interested in writing a JDBC based backend to do this?  I 
could lend you a hand if you're interested in playing with the idea.

I could help submit your patches until you gain karma.

> Of course you have to map hierarchical data structures to tables but
> IBM showed that this is possible.

Yes it is completely possible yet not very efficient but it's worth a 
try.  Let's give it a shot.  You interested?

...

> A JDBC backend for DS could combine fast Java network and data
> structure handling with a fast Enterprise DBMS.
> 
> For embedded use Derby should serve well.
> 
> End of '05 I ran some transaction tests against Derby and got about 20
> TX/sec on my PC which seemed very fast to me.
> As transactions are not very important for LDAP servers, only read
> performance would matter.
> It would definitely be interesting to have numbers for Berkeley DB
> (which of course is not relational).
> 

We used to use BDB with the JNI interface until we found out that jdbm 
was much faster without having to double copy buffers going across the 
JNI interface.

I bet JE is much better than JDBM but we cannot use it here at the ASF. 
  Perhaps later we will write a new more improved partition (backend) 
implementation at safehaus using JE because of these licensing issues.

Regards,
Alex