You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ak...@apache.org> on 2008/03/23 02:30:26 UTC

[LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

While implementing the Cursor pattern in the JDBM based partition, some
interesting ideas came to mind regarding the new LDAP Client API we've been
working on.  Up to now we have simply defined what Entries look like and
have called it the Entry API instead.  Eventually this will grow into a full
client as we combine more of the pieces together.  JNDI is a less than
optimal API for LDAP.

Cursors are special, in contrast to the NamingEnumerations we use, because
they can be positioned after creation.  Cursors can be positioned using the
beforeFirst(), first(), last() and afterLast() methods at any point during
their use.  Furthermore, Cursors are bidirectional; they can be traversed in
both directions with calls to next() and previous() to navigate results.
Callers can advance a Cursor forward or reverse at any time after creation.

The Partition interface in ApacheDS will soon expose search results by
returning a Cursor<ServerEntry> instead of a NamingEnumeration.  Depending
on the filter used, this is a composite Cursor which leverages partition
indices to position itself without having to buffer results.  This allows
the server to pull entries satisfying the search scope and filter one at a
time, and return it to the client without massive latency, or memory
consumption.  It also means we can process many more concurrent requests as
well as process single requests faster.  In addition a resultant search
Cursor can be advanced or position using the methods described above just by
having nested Cursors based on indicies advance or jump to the appropriate
Index positions.  We already have some of these footprint benefits with
NamingEnumerations, however the positioning and advancing capabilities are
not present with NamingEnumerations.

During the course of this work, I questioned whether or not client side
Cursors would be possible.  Perhaps not under the current protocol without
some controls or extended operations.  Technical barriers in the protocol
aside, I started to dream about how this neat capability could impact
clients.  With it, clients can advance or position themselves over the
results of a search as they like.  Clients may even be able to freeze a
search in progress, by having the server tuck away the server side Cursor's
state, to be revisited later.  For lack of terms I've likened this to a form
of asynchronous bidirectional LDAP search. This would eliminate the need to
bother with paging controls.  It could even be used to eliminate the thread
per search problem associated with persistent search.  OK, let me stop
dreaming and start looking at reality so we can determine if this is even a
possibility.

So these characteristics of a Cursor have a profound impact on the semantics
of a search operation - not talking about the protocol yet.  I'm referring
to search as seen from the perspective of client callers using the Cursor:
the front end.  As stated search operations can be initiated and shelved to
persist the state of the search by tucking away the Cursor in the connection
session.  A Cursor for a search will automatically track it's position.

However the protocol imposes some limitations on being able to leverage
these capabilities across the network on an LDAP client.  A search request
begins the search, and entry responses are received from the server, until
the server returns a search response done operation which  signals the end
of the search operation.  During this sequence, without creative extended
operations, or controls, there's little the client can do to influence the
entries returned by the server or throttle the rate of return.  Of course
size and time limits can be set on the search request but after issuing the
search, these cannot be altered.  Because the LdapMessage envelop contains a
messageId, and all responses contain the messageId of the request they
correspond to, the protocol allows for multiple requests to be issued in
parallel.  Even if client API's do not allow for it, this is certainly
possible.

Although I've long forgotten how the paging control works exactly, I still
have a rough idea: forgive me for my laziness and if I'm missing something.
A control specifies some number of results to return per page, and the
server complies by limiting the search to that number then capping off the
search operation with a search result done.  Cookies in the request and
response controls are used to track the progress, so another search request
for the next page returns the next page rather than initiating the search
from the start.  This breaks a big search up into many smaller search
requests.  This way the client has the ability to intervene in what
otherwise would be a long flood of results in a single large search
operation.  If this page control could also convey information about
positioning, and directionality, along with a page size set to 1, we could
implement client side Cursors with the same capabilities they posses on the
server.  Paging search results effectively has the server tucking away the
search Cursor state into the client session and pulling it out again to
continue.  This is how we would implement this control today (that is if
anyone gets the time to do so :) ).

Persistent search is unrelated however I'd like to explore whether or not
there's some possible synergy/relationship between it and paging.
(Persistent search IMHO is poorly implemented but we can deal with it). It
is intended for receiving change notifications.  A persistent search control
is issued on a search request which may return a bunch of entry responses
for the filter if requested, but the most notable thing is that the search
does not end with a search result done response.  The operation persists to
return entries satisfying the filter along with a response control
containing the change type when entries satisfying the filter change
accordingly.  Clients usually need to assign a thread to listen for such
responses.  Smart clients will use a single thread instead of one per
persistent search.  Even smarter clients will use a single thread to listen
for search responses on persistent search requests and for unsolicited
notifications.  Regardless once the persistent search request is issued,
there's no way the client can stop it until size and time limits are
reached.  These are parameters in the control sent on the search request.
Of course persistent search requests can use a size limit of 1 and clients
can request another persistent search after an event is received to have
more control.  Regardless a change many not occur for a while in which case
fine tuning with the time limit will help.

It's a bit crazy to think what would happen if both these controls are used
together on the same search request.  I guess it all depends on the server
implementation in the end.  Not sure if anyone would even want to do this.
Regardless of the pain it would entail, I think this situation can be
managed to work in the server.  Now where does this lead us tho?  Perhaps
the Cursor interface could be enhanced to support a listener to
asynchronously notify users of changes to the underlying results.  The
Cursor can then be reset (or just repositioned).  Of course this is all
presuming the Cursor was created to traverse results.  Instead a Cursor
might just be created to iterate over only changes, but does this make
sense?  Whatever the answer, at least we can know when underlying results
have changed to invalidate the Cursor on the client.   This is probably a
bullsh*t idea but it's entertaining to think about.

BTW change notifications are probably best implemented as a combination of
search and extended operations through unsolicited notifications.   The
client issues a search request with a control similar to the persistent
search request control.  Instead of 'persisting' the search, the search
returns immediately with a search result done response using a result code
to indicate whether or not the server will honor the request to be notified
of changes.  Then the client is done registering the request to be
notified.  Whenever the server detects changes that satisfy the changeType,
scope and filter of the notification registration, it sends an unsolicited
extended response to the client.  The payload carries information about the
change which took place similar to the way search entry results do with the
response control of persistent search.   The client can issue a
deregistration message to the server to stop receiving these notifications
using an ExtendedRequest.  The server would respond to this with an
ExtendedResponse.  IMO this is a much better mechanism with full control
over the subscription and notification process.

Alex

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Alex Karasulu <ak...@apache.org>.
On Sun, Mar 23, 2008 at 8:17 PM, Howard Chu <hy...@symas.com> wrote:

> Alex Karasulu wrote:
> > On Sun, Mar 23, 2008 at 6:15 AM, Howard Chu <hyc@symas.com
> > <ma...@symas.com>> wrote:
> >
> >     Using cursors for walking entry lists (and saving the cursor state)
> is
> >     certainly useful inside the server, but there's nothing you can
> >     safely gain
> >     from the client side.
> >
> >     It kind of sounds like you're talking about Virtual List Views, not
> >     paged
> >     results. Remember that search responses in LDAP/X.500 are unordered
> by
> >     definition. Therefore it makes no sense for a standards-compliant
> >     client to
> >     send an initial request with a Paging control saying "start at
> responses
> >     200-300" because the order in which entries will be returned is not
> >     defined.
> >     You need something like VLV which requires SSS to even begin
> >     thinking about
> >     this; it's not a job for Paged Results.
> >
> >
> > Yes you're totally right, I've confused the two controls. Thanks for the
> > correction.
> >
> > Either way we still need to implement both. Does OpenLDAP implements
> > both of these controls? Any opinions or advice regarding implementation
> > and or the actual utility of these controls?
>
> OpenLDAP currently has a no-op implementation of SSS and no VLV. There's
> been
> some discussion about implementing them, but no one has been interested
> enough
> to do it so far. The main problem being that SSS gets to be rather
> annoying in
> the context of a truly distributed DIT, plus the CPU time involved makes
> the
> whole proposition unscalable. This is one of those cases where, given 100s
> of
> clients talking to a single server, it makes more sense to distribute the
> work
> to the clients than to run hundreds of sort variations on the single
> server.
>
> Netscape/iPlanet obviously took the opposite view, and decided there would
> probably be small enough variation in the types of searches being used
> that
> you could index them explicitly to avoid the overhead. We may follow on
> that
> road as well, and just return UnwillingToPerform for searches that were
> not
> indexed.


This is a sane strategy.  To implement SSS you need an index on the
attribute to sort results on and you're just not going to have that all the
time.  Just one bad search can negatively impact other concurrent search
requests by drawing more CPU or unnecessarily turning over the cache with
full table scans.

A while back I got an idea on this that might have some value.  The strategy
above could be taken with an optional referral to a replica that may contain
the index to satisfy the VLV-SSS request.  Often I found having one heavily
indexed replica (the fat bastard server), helps with ad hoc queries
composing 2-5% of the search requests.  The other 95-98% of the search
requests could then be handled by the standard set of slim replicas.  So if
any of the slim replicas find they cannot efficiently handle the request due
to a lack of indices, they would then send back a referral to the fat
bastard.

The slim replicas should be able to service the majority of queries rapidly
sending only those few requests they cannot to the fat bastard.  The fat
bastard would do more work to keep up with DIT changes but it should avoid
breathing hard on the ad hoc searches.  The additional index maintenance
overhead on the fat bastard is worth it if it can keep these nasty queries
off of slim replicas hence saving them from performance degradation and
cache turn over.

Alex

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Howard Chu <hy...@symas.com>.
Alex Karasulu wrote:
> On Sun, Mar 23, 2008 at 6:15 AM, Howard Chu <hyc@symas.com
> <ma...@symas.com>> wrote:
>
>     Using cursors for walking entry lists (and saving the cursor state) is
>     certainly useful inside the server, but there's nothing you can
>     safely gain
>     from the client side.
>
>     It kind of sounds like you're talking about Virtual List Views, not
>     paged
>     results. Remember that search responses in LDAP/X.500 are unordered by
>     definition. Therefore it makes no sense for a standards-compliant
>     client to
>     send an initial request with a Paging control saying "start at responses
>     200-300" because the order in which entries will be returned is not
>     defined.
>     You need something like VLV which requires SSS to even begin
>     thinking about
>     this; it's not a job for Paged Results.
>
>
> Yes you're totally right, I've confused the two controls. Thanks for the
> correction.
>
> Either way we still need to implement both. Does OpenLDAP implements
> both of these controls? Any opinions or advice regarding implementation
> and or the actual utility of these controls?

OpenLDAP currently has a no-op implementation of SSS and no VLV. There's been 
some discussion about implementing them, but no one has been interested enough 
to do it so far. The main problem being that SSS gets to be rather annoying in 
the context of a truly distributed DIT, plus the CPU time involved makes the 
whole proposition unscalable. This is one of those cases where, given 100s of 
clients talking to a single server, it makes more sense to distribute the work 
to the clients than to run hundreds of sort variations on the single server.

Netscape/iPlanet obviously took the opposite view, and decided there would 
probably be small enough variation in the types of searches being used that 
you could index them explicitly to avoid the overhead. We may follow on that 
road as well, and just return UnwillingToPerform for searches that were not 
indexed.
-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Alex Karasulu <ak...@apache.org>.
On Sun, Mar 23, 2008 at 6:15 AM, Howard Chu <hy...@symas.com> wrote:

> Using cursors for walking entry lists (and saving the cursor state) is
> certainly useful inside the server, but there's nothing you can safely
> gain
> from the client side.
>
> It kind of sounds like you're talking about Virtual List Views, not paged
> results. Remember that search responses in LDAP/X.500 are unordered by
> definition. Therefore it makes no sense for a standards-compliant client
> to
> send an initial request with a Paging control saying "start at responses
> 200-300" because the order in which entries will be returned is not
> defined.
> You need something like VLV which requires SSS to even begin thinking
> about
> this; it's not a job for Paged Results.
>

Yes you're totally right, I've confused the two controls.  Thanks for the
correction.

Either way we still need to implement both.  Does OpenLDAP implements both
of these controls?  Any opinions or advice regarding implementation and or
the actual utility of these controls?

Thanks,
Alex

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Emmanuel Lecharny <el...@gmail.com>.
Hi Howard,

On Sun, Mar 23, 2008 at 11:15 AM, Howard Chu <hy...@symas.com> wrote:
> Using cursors for walking entry lists (and saving the cursor state) is
>  certainly useful inside the server, but there's nothing you can safely gain
>  from the client side.

Well, the mechanism by itself can be used. The way we implemented the
browser would certainly benefit from ushc a mechanism. Of course, you
will have no guarantee that the data are up to date (and we have added
a refesh button just for this case)

>
>  It kind of sounds like you're talking about Virtual List Views, not paged
>  results. Remember that search responses in LDAP/X.500 are unordered by
>  definition. Therefore it makes no sense for a standards-compliant client to
>  send an initial request with a Paging control saying "start at responses
>  200-300" because the order in which entries will be returned is not defined.

Here, I don't think that a standard client will benefit from such a
feature, that's for sure ! However, due to the internal structure we
use (BTree), there is no reason why we should not let the client to
benefit from this. Also considering that the client offer is quite
poor atm, if you except jexplorer, ldapbrowser and softera - I don't
even mention the atrocious text-based GUI delivered by M$ ... -, I'm
not too worried about the installed base :)

>  You need something like VLV which requires SSS to even begin thinking about
>  this; it's not a job for Paged Results.

Pages result can be extended a bit, to offer more capabilities. Of
course, this will result in another RFC.
>
>  Even if you have a stable ordering (which SSS is actually unable to guarantee)
>  you still can't reliably identify response #200 since underlying entries may
>  be added or deleted while the search is progressed.

Yes, and this is something I mantionned in my previous mail. We also
need to have a decent cache system in order to guarantee that the
search result is 'fresh'. Not simple, but possible.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Howard Chu <hy...@symas.com>.
Using cursors for walking entry lists (and saving the cursor state) is 
certainly useful inside the server, but there's nothing you can safely gain 
from the client side.

It kind of sounds like you're talking about Virtual List Views, not paged 
results. Remember that search responses in LDAP/X.500 are unordered by 
definition. Therefore it makes no sense for a standards-compliant client to 
send an initial request with a Paging control saying "start at responses 
200-300" because the order in which entries will be returned is not defined. 
You need something like VLV which requires SSS to even begin thinking about 
this; it's not a job for Paged Results.

Even if you have a stable ordering (which SSS is actually unable to guarantee) 
you still can't reliably identify response #200 since underlying entries may 
be added or deleted while the search is progressed.

Emmanuel Lecharny wrote:
> Using cursors into ADS will also allow us to implement the Paging
> control (RFC 2696) so easily ! Even defining a new control (and a new
> RFC) as we will be able to go back and forth, which is not possible
> with the Paging control.

>> The Partition interface in ApacheDS will soon expose search results by
>> returning a Cursor<ServerEntry>  instead of a NamingEnumeration.
>
> It will be Cursor<Entry>  (as this is the top level interface)
>
> Depending
>> on the filter used, this is a composite Cursor which leverages partition
>> indices to position itself without having to buffer results.  This allows
>> the server to pull entries satisfying the search scope and filter one at a
>> time, and return it to the client without massive latency, or memory
>> consumption.  It also means we can process many more concurrent requests as
>> well as process single requests faster.  In addition a resultant search
>> Cursor can be advanced or position using the methods described above just by
>> having nested Cursors based on indicies advance or jump to the appropriate
>> Index positions.  We already have some of these footprint benefits with
>> NamingEnumerations, however the positioning and advancing capabilities are
>> not present with NamingEnumerations.
>
> Further experiments and researches will help a lot here. We may have
> problems too, as this will be a concurrent part : some of the data may
> be modified while the cursor is being read.
>
>> During the course of this work, I questioned whether or not client side
>> Cursors would be possible.  Perhaps not under the current protocol without
>> some controls or extended operations.  Technical barriers in the protocol
>> aside, I started to dream about how this neat capability could impact
>> clients.  With it, clients can advance or position themselves over the
>> results of a search as they like.  Clients may even be able to freeze a
>> search in progress, by having the server tuck away the server side Cursor's
>> state, to be revisited later.
>
> The major improvement with Client cursors is that the client won't
> have anymore to manage a cache of data. Thinking about the Studio, if
> you browse a big tree with thousands of entries, when you want to get
> the entries from [200-300] - assuming you show entries by 100 blocks -
> you have to send another search request _or_ you have to cache all the
> search results in memory. What a waste of time or a waste of memory !
> If we provide such a mechanism, the client won't have to bother with
> such complexity. Data will be brought to the client pieces by pieces :
> if the client want numbe 400 to 500, no need to get the 499 first
> entries. If the client already pumped out the first 100 entries, it's
> just a simple request on the same cursor, no need to compute it again.
>
> So, yes, client cursors make sense too.
>
> For lack of terms I've likened this to a form
>> of asynchronous bidirectional LDAP search. This would eliminate the need to
>> bother with paging controls.  It could even be used to eliminate the thread
>> per search problem associated with persistent search.  OK, let me stop
>> dreaming and start looking at reality so we can determine if this is even a
>> possibility.
>
> Reality is just a dream became true :) (sometime, it's a nightmare :)
>
>> So these characteristics of a Cursor have a profound impact on the semantics
>> of a search operation - not talking about the protocol yet.  I'm referring
>> to search as seen from the perspective of client callers using the Cursor:
>> the front end.  As stated search operations can be initiated and shelved to
>> persist the state of the search by tucking away the Cursor in the connection
>> session.  A Cursor for a search will automatically track it's position.
>>
>> However the protocol imposes some limitations on being able to leverage
>> these capabilities across the network on an LDAP client.  A search request
>> begins the search, and entry responses are received from the server, until
>> the server returns a search response done operation which  signals the end
>> of the search operation.  During this sequence, without creative extended
>> operations, or controls, there's little the client can do to influence the
>> entries returned by the server or throttle the rate of return.  Of course
>> size and time limits can be set on the search request but after issuing the
>> search, these cannot be altered.  Because the LdapMessage envelop contains a
>> messageId, and all responses contain the messageId of the request they
>> correspond to, the protocol allows for multiple requests to be issued in
>> parallel.  Even if client API's do not allow for it, this is certainly
>> possible.
>
> The main point is that each client is associated with a session. It's
> then easy to handle a context and use it to store meta data (like a
> previously created cursor on some search request, cursor which can be
> reused if the underlying data have not been modified).
>
> That bring another matter on the table : if we want to reuse cursors,
> we _must_ implement a decent entry cache.
>> Although I've long forgotten how the paging control works exactly, I still
>> have a rough idea: forgive me for my laziness and if I'm missing something.
>> A control specifies some number of results to return per page, and the
>> server complies by limiting the search to that number then capping off the
>> search operation with a search result done.  Cookies in the request and
>> response controls are used to track the progress, so another search request
>> for the next page returns the next page rather than initiating the search
>> from the start.  This breaks a big search up into many smaller search
>> requests.
>
> This is true from the client perspective. On the server, there should
> be only one search, and the returned results are just waiting for
> another search with the same cookie.
>
> This way the client has the ability to intervene in what
>> otherwise would be a long flood of results in a single large search
>> operation.  If this page control could also convey information about
>> positioning, and directionality, along with a page size set to 1, we could
>> implement client side Cursors with the same capabilities they posses on the
>> server.
>
> Exactly ! For instance, using negative size would result if going
> backward. This is a very minor extension to the paged search RFC, and
> it can even be implemented using the very same control, simply adding
> some semantic to it.
>
> Another extension would be to add a 'position' to start with.
>
> Paging search results effectively has the server tucking away the
>> search Cursor state into the client session and pulling it out again to
>> continue.  This is how we would implement this control today (that is if
>> anyone gets the time to do so :) ).

>> BTW change notifications are probably best implemented as a combination of
>> search and extended operations through unsolicited notifications.   The
>> client issues a search request with a control similar to the persistent
>> search request control.  Instead of 'persisting' the search, the search
>> returns immediately with a search result done response using a result code
>> to indicate whether or not the server will honor the request to be notified
>> of changes.
>
> This is a big semantic shift... Not sure that it will fit with the
> current LDAP protocol. However, LDAP V4 does not exist yet ;)

There's no reason this approach can't be used in LDAPv3. Just that no existing 
LDAPv3 clients or servers have support for such a control at the moment.
-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Alex Karasulu <ak...@apache.org>.
On Sun, Mar 23, 2008 at 2:28 PM, Alex Karasulu <ak...@apache.org> wrote:

> On Sun, Mar 23, 2008 at 5:35 AM, Emmanuel Lecharny <el...@gmail.com>
> wrote:
> Right now of course theirs no way I can control this so we do get dirty
> reads.  I can however
>

er, s/there's/theirs/

Alex

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Alex Karasulu <ak...@apache.org>.
On Sun, Mar 23, 2008 at 5:35 AM, Emmanuel Lecharny <el...@gmail.com>
wrote:

> Hi Alex,
>

...


> We have to extend this [client] API to cover all the LDAP
> operation (connect, disconnect, send and receive messages, controls,
> etc)
>

I'd love to base our delegated authentication and proxy capabilities on this
new API rather than polluting the server with more JNDI code.  Also I would
rather see this GSoC project on LDAP Object Mapping to use this API instead
of JNDI as well.  I think I'm done with JNDI forever - can't stand it
anymore.

...


>
> > The Partition interface in ApacheDS will soon expose search results by
> > returning a Cursor<ServerEntry> instead of a NamingEnumeration.
>
> It will be Cursor<Entry> (as this is the top level interface)
>

I think you might have read the line above wrong, the Partition interface is
inside the server so the Cursors it would return would traverse ServerEntry
object not Entry objects.


>
> Depending
> > on the filter used, this is a composite Cursor which leverages partition
> > indices to position itself without having to buffer results.  This
> allows
> > the server to pull entries satisfying the search scope and filter one at
> a
> > time, and return it to the client without massive latency, or memory
> > consumption.  It also means we can process many more concurrent requests
> as
> > well as process single requests faster.  In addition a resultant search
> > Cursor can be advanced or position using the methods described above
> just by
> > having nested Cursors based on indicies advance or jump to the
> appropriate
> > Index positions.  We already have some of these footprint benefits with
> > NamingEnumerations, however the positioning and advancing capabilities
> are
> > not present with NamingEnumerations.
>
> Further experiments and researches will help a lot here. We may have
> problems too, as this will be a concurrent part : some of the data may
> be modified while the cursor is being read.
>

You just stepped on a very interesting topic.  The topic of isolation.  One
of the documentation items I put into the definition of Cursors was the fact
that they are fully isolated.  However we can vary that if we like.  They
could be setup to traverse results constrained to a specific revision and
below in the server.

Right now of course theirs no way I can control this so we do get dirty
reads.  I can however prevent this by leveraging an index on a revision
(once we define it) or even on the modifyTimestamp.  This can be used to
constrain the results returned by the Cursor so dirty reads do not occur.


>
> >
> > During the course of this work, I questioned whether or not client side
> > Cursors would be possible.  Perhaps not under the current protocol
> without
> > some controls or extended operations.  Technical barriers in the
> protocol
> > aside, I started to dream about how this neat capability could impact
> > clients.  With it, clients can advance or position themselves over the
> > results of a search as they like.  Clients may even be able to freeze a
> > search in progress, by having the server tuck away the server side
> Cursor's
> > state, to be revisited later.
>
> The major improvement with Client cursors is that the client won't
> have anymore to manage a cache of data. Thinking about the Studio, if
> you browse a big tree with thousands of entries, when you want to get
> the entries from [200-300] - assuming you show entries by 100 blocks -
> you have to send another search request _or_ you have to cache all the
> search results in memory. What a waste of time or a waste of memory !
> If we provide such a mechanism, the client won't have to bother with
> such complexity. Data will be brought to the client pieces by pieces :
> if the client want numbe 400 to 500, no need to get the 499 first
> entries. If the client already pumped out the first 100 entries, it's
> just a simple request on the same cursor, no need to compute it again.
>
> So, yes, client cursors make sense too.
>

Yeah I was thinking about using these constructs in Studio to make it more
efficient at dealing with very large directories.

Thankfully Howard pointed out that we need the VLV control with server side
sorting to achieve this rather than just the PSR control.  The PSR control I
guess really just impacts what would perhaps become some kind of batch size
the Cursor would work with.

...


> This way the client has the ability to intervene in what
> > otherwise would be a long flood of results in a single large search
> > operation.  If this page control could also convey information about
> > positioning, and directionality, along with a page size set to 1, we
> could
> > implement client side Cursors with the same capabilities they posses on
> the
> > server.
>
> Exactly ! For instance, using negative size would result if going
> backward. This is a very minor extension to the paged search RFC, and
> it can even be implemented using the very same control, simply adding
> some semantic to it.
>

Again so sorry, I confused the VLV and PSR controls.   Now that I have it
straight I need to look at this draft specification for VLV which looks
ancient.  Wondering why it never made it to RFC.


>
> Another extension would be to add a 'position' to start with.
>

Yes that would be neat I have some ideas on this.  First I want to reread
the PSR RFC and the VLV draft just to get back up to speed with them.
Perhaps we need a new control (as a last resort), but I don't want to do
that if we don't have to.

Alex

Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification

Posted by Emmanuel Lecharny <el...@gmail.com>.
Hi Alex,

On Sun, Mar 23, 2008 at 2:30 AM, Alex Karasulu <ak...@apache.org> wrote:
> While implementing the Cursor pattern in the JDBM based partition, some
> interesting ideas came to mind regarding the new LDAP Client API we've been
> working on.  Up to now we have simply defined what Entries look like and
> have called it the Entry API instead.  Eventually this will grow into a full
> client as we combine more of the pieces together.  JNDI is a less than
> optimal API for LDAP.

That's for sure ! We have to extend this API to cover all the LDAP
operation (connect, disconnect, send and receive messages, controls,
etc)

>
> Cursors are special, in contrast to the NamingEnumerations we use, because
> they can be positioned after creation.  Cursors can be positioned using the
> beforeFirst(), first(), last() and afterLast() methods at any point during
> their use.  Furthermore, Cursors are bidirectional; they can be traversed in
> both directions with calls to next() and previous() to navigate results.
> Callers can advance a Cursor forward or reverse at any time after creation.

Using cursors into ADS will also allow us to implement the Paging
control (RFC 2696) so easily ! Even defining a new control (and a new
RFC) as we will be able to go back and forth, which is not possible
with the Paging control.

> The Partition interface in ApacheDS will soon expose search results by
> returning a Cursor<ServerEntry> instead of a NamingEnumeration.

It will be Cursor<Entry> (as this is the top level interface)

Depending
> on the filter used, this is a composite Cursor which leverages partition
> indices to position itself without having to buffer results.  This allows
> the server to pull entries satisfying the search scope and filter one at a
> time, and return it to the client without massive latency, or memory
> consumption.  It also means we can process many more concurrent requests as
> well as process single requests faster.  In addition a resultant search
> Cursor can be advanced or position using the methods described above just by
> having nested Cursors based on indicies advance or jump to the appropriate
> Index positions.  We already have some of these footprint benefits with
> NamingEnumerations, however the positioning and advancing capabilities are
> not present with NamingEnumerations.

Further experiments and researches will help a lot here. We may have
problems too, as this will be a concurrent part : some of the data may
be modified while the cursor is being read.

>
> During the course of this work, I questioned whether or not client side
> Cursors would be possible.  Perhaps not under the current protocol without
> some controls or extended operations.  Technical barriers in the protocol
> aside, I started to dream about how this neat capability could impact
> clients.  With it, clients can advance or position themselves over the
> results of a search as they like.  Clients may even be able to freeze a
> search in progress, by having the server tuck away the server side Cursor's
> state, to be revisited later.

The major improvement with Client cursors is that the client won't
have anymore to manage a cache of data. Thinking about the Studio, if
you browse a big tree with thousands of entries, when you want to get
the entries from [200-300] - assuming you show entries by 100 blocks -
you have to send another search request _or_ you have to cache all the
search results in memory. What a waste of time or a waste of memory !
If we provide such a mechanism, the client won't have to bother with
such complexity. Data will be brought to the client pieces by pieces :
if the client want numbe 400 to 500, no need to get the 499 first
entries. If the client already pumped out the first 100 entries, it's
just a simple request on the same cursor, no need to compute it again.

So, yes, client cursors make sense too.

For lack of terms I've likened this to a form
> of asynchronous bidirectional LDAP search. This would eliminate the need to
> bother with paging controls.  It could even be used to eliminate the thread
> per search problem associated with persistent search.  OK, let me stop
> dreaming and start looking at reality so we can determine if this is even a
> possibility.

Reality is just a dream became true :) (sometime, it's a nightmare :)

>
> So these characteristics of a Cursor have a profound impact on the semantics
> of a search operation - not talking about the protocol yet.  I'm referring
> to search as seen from the perspective of client callers using the Cursor:
> the front end.  As stated search operations can be initiated and shelved to
> persist the state of the search by tucking away the Cursor in the connection
> session.  A Cursor for a search will automatically track it's position.
>
> However the protocol imposes some limitations on being able to leverage
> these capabilities across the network on an LDAP client.  A search request
> begins the search, and entry responses are received from the server, until
> the server returns a search response done operation which  signals the end
> of the search operation.  During this sequence, without creative extended
> operations, or controls, there's little the client can do to influence the
> entries returned by the server or throttle the rate of return.  Of course
> size and time limits can be set on the search request but after issuing the
> search, these cannot be altered.  Because the LdapMessage envelop contains a
> messageId, and all responses contain the messageId of the request they
> correspond to, the protocol allows for multiple requests to be issued in
> parallel.  Even if client API's do not allow for it, this is certainly
> possible.

The main point is that each client is associated with a session. It's
then easy to handle a context and use it to store meta data (like a
previously created cursor on some search request, cursor which can be
reused if the underlying data have not been modified).

That bring another matter on the table : if we want to reuse cursors,
we _must_ implement a decent entry cache.
>
> Although I've long forgotten how the paging control works exactly, I still
> have a rough idea: forgive me for my laziness and if I'm missing something.
> A control specifies some number of results to return per page, and the
> server complies by limiting the search to that number then capping off the
> search operation with a search result done.  Cookies in the request and
> response controls are used to track the progress, so another search request
> for the next page returns the next page rather than initiating the search
> from the start.  This breaks a big search up into many smaller search
> requests.

This is true from the client perspective. On the server, there should
be only one search, and the returned results are just waiting for
another search with the same cookie.

This way the client has the ability to intervene in what
> otherwise would be a long flood of results in a single large search
> operation.  If this page control could also convey information about
> positioning, and directionality, along with a page size set to 1, we could
> implement client side Cursors with the same capabilities they posses on the
> server.

Exactly ! For instance, using negative size would result if going
backward. This is a very minor extension to the paged search RFC, and
it can even be implemented using the very same control, simply adding
some semantic to it.

Another extension would be to add a 'position' to start with.

Paging search results effectively has the server tucking away the
> search Cursor state into the client session and pulling it out again to
> continue.  This is how we would implement this control today (that is if
> anyone gets the time to do so :) ).
>
> Persistent search is unrelated however I'd like to explore whether or not
> there's some possible synergy/relationship between it and paging.
> (Persistent search IMHO is poorly implemented but we can deal with it). It
> is intended for receiving change notifications.  A persistent search control
> is issued on a search request which may return a bunch of entry responses
> for the filter if requested, but the most notable thing is that the search
> does not end with a search result done response.  The operation persists to
> return entries satisfying the filter along with a response control
> containing the change type when entries satisfying the filter change
> accordingly.  Clients usually need to assign a thread to listen for such
> responses.  Smart clients will use a single thread instead of one per
> persistent search.  Even smarter clients will use a single thread to listen
> for search responses on persistent search requests and for unsolicited
> notifications.  Regardless once the persistent search request is issued,
> there's no way the client can stop it until size and time limits are
> reached.  These are parameters in the control sent on the search request.
> Of course persistent search requests can use a size limit of 1 and clients
> can request another persistent search after an event is received to have
> more control.  Regardless a change many not occur for a while in which case
> fine tuning with the time limit will help.

The beauty of this cursors approach is that it can be used to
implement persistent search so easily ! In fact, a cursor is just a
persistent search, until you discard it :)


>
> It's a bit crazy to think what would happen if both these controls are used
> together on the same search request.  I guess it all depends on the server
> implementation in the end.  Not sure if anyone would even want to do this.
> Regardless of the pain it would entail, I think this situation can be
> managed to work in the server.  Now where does this lead us tho?  Perhaps
> the Cursor interface could be enhanced to support a listener to
> asynchronously notify users of changes to the underlying results.  The
> Cursor can then be reset (or just repositioned).  Of course this is all
> presuming the Cursor was created to traverse results.  Instead a Cursor
> might just be created to iterate over only changes, but does this make
> sense?  Whatever the answer, at least we can know when underlying results
> have changed to invalidate the Cursor on the client.   This is probably a
> bullsh*t idea but it's entertaining to think about.

Yeah, not sure it worth the price, but at least, we have the tools to build it !

>
> BTW change notifications are probably best implemented as a combination of
> search and extended operations through unsolicited notifications.   The
> client issues a search request with a control similar to the persistent
> search request control.  Instead of 'persisting' the search, the search
> returns immediately with a search result done response using a result code
> to indicate whether or not the server will honor the request to be notified
> of changes.

This is a big semantic shift... Not sure that it will fit with the
current LDAP protocol. However, LDAP V4 does not exist yet ;)

 Then the client is done registering the request to be notified.
> Whenever the server detects changes that satisfy the changeType, scope and
> filter of the notification registration, it sends an unsolicited extended
> response to the client.  The payload carries information about the change
> which took place similar to the way search entry results do with the
> response control of persistent search.   The client can issue a
> deregistration message to the server to stop receiving these notifications
> using an ExtendedRequest.  The server would respond to this with an
> ExtendedResponse.  IMO this is a much better mechanism with full control
> over the subscription and notification process.

That's also some good ideas. As we have to overload the JNDI
Listeners, I think we have to implement such a solution anyway.

Thanks Alex !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com