You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2011/06/15 17:19:24 UTC

Alias handling issues

Hi,

sadly, things are not as rosy as I exepected...

I have written a few tests to check if we correctly detect cycles while 
processing a search, and at some point, I found that cycle are not an 
issue at all. But wait, it's not a good news :/

Let's say you have the following entries :

cn=test
   cn=foo,cn=test
     cn=barAlias,cn=foo,cn=test -> cn=bar,cn=test
   cn=bar,cn=test
     cn=dohAlias,cn=foo,cn=test -> cn=doh,cn=test
   cn=doh,cn=test

Logically, doing a SUBTREE search on cn=foo,cn=test, you should get the 
following entries :
cn=foo,cn=test
cn=bar,cn=test
cn=doh,cn=test

This is *not* what we get. Currently, you'll have :
cn=foo,cn=test
cn=bar,cn=test

The second alias dereferencing is never done.

Obviously, it helps when it come sto avoid cycle, but this is certainly 
not the expected behavior.

Now, if we want to do alias chasing on the server, we will have to 
modify the way we handle alias : each one of them will issue a new 
search, wth a new cursor.

Hopefully, stacking the aliases to be processed works well with the fact 
that we have to stack the aliases for cycle detection. One possible 
solution would be to process this stack alias after alias, and if we get 
back an alias, we add it in the stack if it's not already present 
(otherwise, that means we have a cycle).

In our sample, we will have the following stack :
() : empty stack
(cn=barAlias,cn=foo,cn=test) : first alias met
-> here, we have processed all the entries for the initial search
   [cn=foo,cn=test]
   [cn=bar,cn=test] (the dereferenced alias)
-> now, we get the leftmost element in the stack, and launch a new search
(<cn=barAlias,cn=foo,cn=test>) : this alias is being processed, thus the <>
(<cn=barAlias,cn=foo,cn=test>, cn=dohAlias,cn=foo,cn=test) : second 
alias met
-> again, all the entries have been processed, we take the next alias in 
the stack
   [cn=doh,cn=test]
(<cn=barAlias,cn=foo,cn=test>, <cn=dohAlias,cn=foo,cn=test>) : second 
alias met
-> there are no further entries. We are done

Seems to work... Did I miss something ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Alias handling issues

Posted by Emmanuel Lécharny <el...@apache.org>.
On 6/15/11 7:11 PM, Stefan Seelmann wrote:
> On Wed, Jun 15, 2011 at 7:01 PM, Emmanuel Lecharny<el...@gmail.com>  wrote:
>> On 6/15/11 6:17 PM, Pierre-Arnaud Marcelot wrote:
>>> On 15 juin 2011, at 18:01, Emmanuel Lecharny wrote:
>>>
>>>> On 6/15/11 5:19 PM, Emmanuel Lecharny wrote:
>>>>> Hi,
>>>>>
>>>>> sadly, things are not as rosy as I exepected...
>>>>>
>>>>> I have written a few tests to check if we correctly detect cycles while
>>>>> processing a search, and at some point, I found that cycle are not an issue
>>>>> at all. But wait, it's not a good news :/
>>>>>
>>>>> Let's say you have the following entries :
>>>>>
>>>>> cn=test
>>>>>   cn=foo,cn=test
>>>>>     cn=barAlias,cn=foo,cn=test ->    cn=bar,cn=test
>>>>>   cn=bar,cn=test
>>>>>     cn=dohAlias,cn=foo,cn=test ->    cn=doh,cn=test
>>>>>   cn=doh,cn=test
>>>>>
>>>>> Logically, doing a SUBTREE search on cn=foo,cn=test, you should get the
>>>>> following entries :
>>>>> cn=foo,cn=test
>>>>> cn=bar,cn=test
>>>>> cn=doh,cn=test
>>>>>
>>>>> This is *not* what we get. Currently, you'll have :
>>>>> cn=foo,cn=test
>>>>> cn=bar,cn=test
>>>>>
>>>>> The second alias dereferencing is never done.
>>>>>
>>>>> Obviously, it helps when it come sto avoid cycle, but this is certainly
>>>>> not the expected behavior.
>>>>>
>>>>> Now, if we want to do alias chasing on the server, we will have to
>>>>> modify the way we handle alias : each one of them will issue a new search,
>>>>> wth a new cursor.
>>>>>
>>>>> Hopefully, stacking the aliases to be processed works well with the fact
>>>>> that we have to stack the aliases for cycle detection. One possible solution
>>>>> would be to process this stack alias after alias, and if we get back an
>>>>> alias, we add it in the stack if it's not already present (otherwise, that
>>>>> means we have a cycle).
>>>>>
>>>>> In our sample, we will have the following stack :
>>>>> () : empty stack
>>>>> (cn=barAlias,cn=foo,cn=test) : first alias met
>>>>> ->    here, we have processed all the entries for the initial search
>>>>>   [cn=foo,cn=test]
>>>>>   [cn=bar,cn=test] (the dereferenced alias)
>>>>> ->    now, we get the leftmost element in the stack, and launch a new
>>>>> search
>>>>> (<cn=barAlias,cn=foo,cn=test>) : this alias is being processed, thus
>>>>> the<>
>>>>> (<cn=barAlias,cn=foo,cn=test>, cn=dohAlias,cn=foo,cn=test) : second
>>>>> alias met
>>>>> ->    again, all the entries have been processed, we take the next alias
>>>>> in the stack
>>>>>   [cn=doh,cn=test]
>>>>> (<cn=barAlias,cn=foo,cn=test>,<cn=dohAlias,cn=foo,cn=test>) : second
>>>>> alias met
>>>>> ->    there are no further entries. We are done
>>>>>
>>>>> Seems to work... Did I miss something ?
>>>> Yes, I missed something (thanks Pierre-Arnaud for pointing this out) :
>>>>
>>>> if the alias get dereferenced to an entry below one of the DN already
>>>> stored in the stack, or to the descendant of one of the stored DNs, then we
>>>> don't need to proceed the search for this alias, as the entries have already
>>>> been provided. That will avoid sending duplicate entries to the user.
>>> Yeah, a basic example would be to execute the same search as above but one
>>> level higher, from "cn=test".
>>>
>>> In that case, the first search will already give us the three resulting
>>> entries:
>>> - cn=foo,cn=test
>>> - cn=bar,cn=test
>>> - cn=doh,cn=test
>>>
>>> And then, in the other searches, triggered when following aliases, some of
>>> these entries would match again. For instance:
>>> - cn=bar,cn=test
>>> - cn=doh,cn=test
>>>
>>> We have to make sure that we don't send them multiple times to the user.
>>> Maybe by storing a list of all the DNs we already returned to the user...
>>> But that might be a little bit overkill and suck too much memory.
>> By storing the root DN of each consecutive search, we will avoid such
>> problem.
>>
>> In this case, the initial search is done with cn=foo,cn=test, and this is
>> the stored DN.
>> The first search will just returns the cn=foo,cn=test entry and the
>> cn=barAlias,cn=foo,cn=test entry, which will be dereferenced.
>>
>> This dereferemced operation consists on doing a search on the aliasedObject
>> stored in the alias. As it's a new search, we store the base DN
>> (cn=bar,cn=test), and we launch the new search. Etc...
>>
>> Should work pretty well without having to store all the entries' DNs
> Yes, that stored DNs help for both, (1) avoiding duplicates and (2)
> avoiding cycles. But remember, for millions of aliases you have to
> store millions of DNs ;-)
Too bad for the millions of aliases :)

Those using too many aliases may :
- either have a very badly designed database
- or managing the CIA employees...

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Alias handling issues

Posted by Stefan Seelmann <se...@apache.org>.
On Wed, Jun 15, 2011 at 7:01 PM, Emmanuel Lecharny <el...@gmail.com> wrote:
> On 6/15/11 6:17 PM, Pierre-Arnaud Marcelot wrote:
>>
>> On 15 juin 2011, at 18:01, Emmanuel Lecharny wrote:
>>
>>> On 6/15/11 5:19 PM, Emmanuel Lecharny wrote:
>>>>
>>>> Hi,
>>>>
>>>> sadly, things are not as rosy as I exepected...
>>>>
>>>> I have written a few tests to check if we correctly detect cycles while
>>>> processing a search, and at some point, I found that cycle are not an issue
>>>> at all. But wait, it's not a good news :/
>>>>
>>>> Let's say you have the following entries :
>>>>
>>>> cn=test
>>>>  cn=foo,cn=test
>>>>    cn=barAlias,cn=foo,cn=test ->  cn=bar,cn=test
>>>>  cn=bar,cn=test
>>>>    cn=dohAlias,cn=foo,cn=test ->  cn=doh,cn=test
>>>>  cn=doh,cn=test
>>>>
>>>> Logically, doing a SUBTREE search on cn=foo,cn=test, you should get the
>>>> following entries :
>>>> cn=foo,cn=test
>>>> cn=bar,cn=test
>>>> cn=doh,cn=test
>>>>
>>>> This is *not* what we get. Currently, you'll have :
>>>> cn=foo,cn=test
>>>> cn=bar,cn=test
>>>>
>>>> The second alias dereferencing is never done.
>>>>
>>>> Obviously, it helps when it come sto avoid cycle, but this is certainly
>>>> not the expected behavior.
>>>>
>>>> Now, if we want to do alias chasing on the server, we will have to
>>>> modify the way we handle alias : each one of them will issue a new search,
>>>> wth a new cursor.
>>>>
>>>> Hopefully, stacking the aliases to be processed works well with the fact
>>>> that we have to stack the aliases for cycle detection. One possible solution
>>>> would be to process this stack alias after alias, and if we get back an
>>>> alias, we add it in the stack if it's not already present (otherwise, that
>>>> means we have a cycle).
>>>>
>>>> In our sample, we will have the following stack :
>>>> () : empty stack
>>>> (cn=barAlias,cn=foo,cn=test) : first alias met
>>>> ->  here, we have processed all the entries for the initial search
>>>>  [cn=foo,cn=test]
>>>>  [cn=bar,cn=test] (the dereferenced alias)
>>>> ->  now, we get the leftmost element in the stack, and launch a new
>>>> search
>>>> (<cn=barAlias,cn=foo,cn=test>) : this alias is being processed, thus
>>>> the<>
>>>> (<cn=barAlias,cn=foo,cn=test>, cn=dohAlias,cn=foo,cn=test) : second
>>>> alias met
>>>> ->  again, all the entries have been processed, we take the next alias
>>>> in the stack
>>>>  [cn=doh,cn=test]
>>>> (<cn=barAlias,cn=foo,cn=test>,<cn=dohAlias,cn=foo,cn=test>) : second
>>>> alias met
>>>> ->  there are no further entries. We are done
>>>>
>>>> Seems to work... Did I miss something ?
>>>
>>> Yes, I missed something (thanks Pierre-Arnaud for pointing this out) :
>>>
>>> if the alias get dereferenced to an entry below one of the DN already
>>> stored in the stack, or to the descendant of one of the stored DNs, then we
>>> don't need to proceed the search for this alias, as the entries have already
>>> been provided. That will avoid sending duplicate entries to the user.
>>
>> Yeah, a basic example would be to execute the same search as above but one
>> level higher, from "cn=test".
>>
>> In that case, the first search will already give us the three resulting
>> entries:
>> - cn=foo,cn=test
>> - cn=bar,cn=test
>> - cn=doh,cn=test
>>
>> And then, in the other searches, triggered when following aliases, some of
>> these entries would match again. For instance:
>> - cn=bar,cn=test
>> - cn=doh,cn=test
>>
>> We have to make sure that we don't send them multiple times to the user.
>> Maybe by storing a list of all the DNs we already returned to the user...
>> But that might be a little bit overkill and suck too much memory.
>
> By storing the root DN of each consecutive search, we will avoid such
> problem.
>
> In this case, the initial search is done with cn=foo,cn=test, and this is
> the stored DN.
> The first search will just returns the cn=foo,cn=test entry and the
> cn=barAlias,cn=foo,cn=test entry, which will be dereferenced.
>
> This dereferemced operation consists on doing a search on the aliasedObject
> stored in the alias. As it's a new search, we store the base DN
> (cn=bar,cn=test), and we launch the new search. Etc...
>
> Should work pretty well without having to store all the entries' DNs

Yes, that stored DNs help for both, (1) avoiding duplicates and (2)
avoiding cycles. But remember, for millions of aliases you have to
store millions of DNs ;-)

Kind Regards,
Stefan

Re: Alias handling issues

Posted by Emmanuel Lecharny <el...@gmail.com>.
On 6/15/11 6:17 PM, Pierre-Arnaud Marcelot wrote:
> On 15 juin 2011, at 18:01, Emmanuel Lecharny wrote:
>
>> On 6/15/11 5:19 PM, Emmanuel Lecharny wrote:
>>> Hi,
>>>
>>> sadly, things are not as rosy as I exepected...
>>>
>>> I have written a few tests to check if we correctly detect cycles while processing a search, and at some point, I found that cycle are not an issue at all. But wait, it's not a good news :/
>>>
>>> Let's say you have the following entries :
>>>
>>> cn=test
>>>   cn=foo,cn=test
>>>     cn=barAlias,cn=foo,cn=test ->  cn=bar,cn=test
>>>   cn=bar,cn=test
>>>     cn=dohAlias,cn=foo,cn=test ->  cn=doh,cn=test
>>>   cn=doh,cn=test
>>>
>>> Logically, doing a SUBTREE search on cn=foo,cn=test, you should get the following entries :
>>> cn=foo,cn=test
>>> cn=bar,cn=test
>>> cn=doh,cn=test
>>>
>>> This is *not* what we get. Currently, you'll have :
>>> cn=foo,cn=test
>>> cn=bar,cn=test
>>>
>>> The second alias dereferencing is never done.
>>>
>>> Obviously, it helps when it come sto avoid cycle, but this is certainly not the expected behavior.
>>>
>>> Now, if we want to do alias chasing on the server, we will have to modify the way we handle alias : each one of them will issue a new search, wth a new cursor.
>>>
>>> Hopefully, stacking the aliases to be processed works well with the fact that we have to stack the aliases for cycle detection. One possible solution would be to process this stack alias after alias, and if we get back an alias, we add it in the stack if it's not already present (otherwise, that means we have a cycle).
>>>
>>> In our sample, we will have the following stack :
>>> () : empty stack
>>> (cn=barAlias,cn=foo,cn=test) : first alias met
>>> ->  here, we have processed all the entries for the initial search
>>>   [cn=foo,cn=test]
>>>   [cn=bar,cn=test] (the dereferenced alias)
>>> ->  now, we get the leftmost element in the stack, and launch a new search
>>> (<cn=barAlias,cn=foo,cn=test>) : this alias is being processed, thus the<>
>>> (<cn=barAlias,cn=foo,cn=test>, cn=dohAlias,cn=foo,cn=test) : second alias met
>>> ->  again, all the entries have been processed, we take the next alias in the stack
>>>   [cn=doh,cn=test]
>>> (<cn=barAlias,cn=foo,cn=test>,<cn=dohAlias,cn=foo,cn=test>) : second alias met
>>> ->  there are no further entries. We are done
>>>
>>> Seems to work... Did I miss something ?
>> Yes, I missed something (thanks Pierre-Arnaud for pointing this out) :
>>
>> if the alias get dereferenced to an entry below one of the DN already stored in the stack, or to the descendant of one of the stored DNs, then we don't need to proceed the search for this alias, as the entries have already been provided. That will avoid sending duplicate entries to the user.
> Yeah, a basic example would be to execute the same search as above but one level higher, from "cn=test".
>
> In that case, the first search will already give us the three resulting entries:
> - cn=foo,cn=test
> - cn=bar,cn=test
> - cn=doh,cn=test
>
> And then, in the other searches, triggered when following aliases, some of these entries would match again. For instance:
> - cn=bar,cn=test
> - cn=doh,cn=test
>
> We have to make sure that we don't send them multiple times to the user.
> Maybe by storing a list of all the DNs we already returned to the user... But that might be a little bit overkill and suck too much memory.
By storing the root DN of each consecutive search, we will avoid such 
problem.

In this case, the initial search is done with cn=foo,cn=test, and this 
is the stored DN.
The first search will just returns the cn=foo,cn=test entry and the 
cn=barAlias,cn=foo,cn=test entry, which will be dereferenced.

This dereferemced operation consists on doing a search on the 
aliasedObject stored in the alias. As it's a new search, we store the 
base DN (cn=bar,cn=test), and we launch the new search. Etc...

Should work pretty well without having to store all the entries' DNs


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Alias handling issues

Posted by Pierre-Arnaud Marcelot <pa...@marcelot.net>.
On 15 juin 2011, at 18:01, Emmanuel Lecharny wrote:

> On 6/15/11 5:19 PM, Emmanuel Lecharny wrote:
>> Hi,
>> 
>> sadly, things are not as rosy as I exepected...
>> 
>> I have written a few tests to check if we correctly detect cycles while processing a search, and at some point, I found that cycle are not an issue at all. But wait, it's not a good news :/
>> 
>> Let's say you have the following entries :
>> 
>> cn=test
>>  cn=foo,cn=test
>>    cn=barAlias,cn=foo,cn=test -> cn=bar,cn=test
>>  cn=bar,cn=test
>>    cn=dohAlias,cn=foo,cn=test -> cn=doh,cn=test
>>  cn=doh,cn=test
>> 
>> Logically, doing a SUBTREE search on cn=foo,cn=test, you should get the following entries :
>> cn=foo,cn=test
>> cn=bar,cn=test
>> cn=doh,cn=test
>> 
>> This is *not* what we get. Currently, you'll have :
>> cn=foo,cn=test
>> cn=bar,cn=test
>> 
>> The second alias dereferencing is never done.
>> 
>> Obviously, it helps when it come sto avoid cycle, but this is certainly not the expected behavior.
>> 
>> Now, if we want to do alias chasing on the server, we will have to modify the way we handle alias : each one of them will issue a new search, wth a new cursor.
>> 
>> Hopefully, stacking the aliases to be processed works well with the fact that we have to stack the aliases for cycle detection. One possible solution would be to process this stack alias after alias, and if we get back an alias, we add it in the stack if it's not already present (otherwise, that means we have a cycle).
>> 
>> In our sample, we will have the following stack :
>> () : empty stack
>> (cn=barAlias,cn=foo,cn=test) : first alias met
>> -> here, we have processed all the entries for the initial search
>>  [cn=foo,cn=test]
>>  [cn=bar,cn=test] (the dereferenced alias)
>> -> now, we get the leftmost element in the stack, and launch a new search
>> (<cn=barAlias,cn=foo,cn=test>) : this alias is being processed, thus the <>
>> (<cn=barAlias,cn=foo,cn=test>, cn=dohAlias,cn=foo,cn=test) : second alias met
>> -> again, all the entries have been processed, we take the next alias in the stack
>>  [cn=doh,cn=test]
>> (<cn=barAlias,cn=foo,cn=test>, <cn=dohAlias,cn=foo,cn=test>) : second alias met
>> -> there are no further entries. We are done
>> 
>> Seems to work... Did I miss something ?
> Yes, I missed something (thanks Pierre-Arnaud for pointing this out) :
> 
> if the alias get dereferenced to an entry below one of the DN already stored in the stack, or to the descendant of one of the stored DNs, then we don't need to proceed the search for this alias, as the entries have already been provided. That will avoid sending duplicate entries to the user.

Yeah, a basic example would be to execute the same search as above but one level higher, from "cn=test".

In that case, the first search will already give us the three resulting entries:
- cn=foo,cn=test
- cn=bar,cn=test
- cn=doh,cn=test

And then, in the other searches, triggered when following aliases, some of these entries would match again. For instance:
- cn=bar,cn=test
- cn=doh,cn=test

We have to make sure that we don't send them multiple times to the user.
Maybe by storing a list of all the DNs we already returned to the user... But that might be a little bit overkill and suck too much memory.

Regards,
Pierre-Arnaud

Re: Alias handling issues

Posted by Emmanuel Lecharny <el...@gmail.com>.
On 6/15/11 5:19 PM, Emmanuel Lecharny wrote:
> Hi,
>
> sadly, things are not as rosy as I exepected...
>
> I have written a few tests to check if we correctly detect cycles 
> while processing a search, and at some point, I found that cycle are 
> not an issue at all. But wait, it's not a good news :/
>
> Let's say you have the following entries :
>
> cn=test
>   cn=foo,cn=test
>     cn=barAlias,cn=foo,cn=test -> cn=bar,cn=test
>   cn=bar,cn=test
>     cn=dohAlias,cn=foo,cn=test -> cn=doh,cn=test
>   cn=doh,cn=test
>
> Logically, doing a SUBTREE search on cn=foo,cn=test, you should get 
> the following entries :
> cn=foo,cn=test
> cn=bar,cn=test
> cn=doh,cn=test
>
> This is *not* what we get. Currently, you'll have :
> cn=foo,cn=test
> cn=bar,cn=test
>
> The second alias dereferencing is never done.
>
> Obviously, it helps when it come sto avoid cycle, but this is 
> certainly not the expected behavior.
>
> Now, if we want to do alias chasing on the server, we will have to 
> modify the way we handle alias : each one of them will issue a new 
> search, wth a new cursor.
>
> Hopefully, stacking the aliases to be processed works well with the 
> fact that we have to stack the aliases for cycle detection. One 
> possible solution would be to process this stack alias after alias, 
> and if we get back an alias, we add it in the stack if it's not 
> already present (otherwise, that means we have a cycle).
>
> In our sample, we will have the following stack :
> () : empty stack
> (cn=barAlias,cn=foo,cn=test) : first alias met
> -> here, we have processed all the entries for the initial search
>   [cn=foo,cn=test]
>   [cn=bar,cn=test] (the dereferenced alias)
> -> now, we get the leftmost element in the stack, and launch a new search
> (<cn=barAlias,cn=foo,cn=test>) : this alias is being processed, thus 
> the <>
> (<cn=barAlias,cn=foo,cn=test>, cn=dohAlias,cn=foo,cn=test) : second 
> alias met
> -> again, all the entries have been processed, we take the next alias 
> in the stack
>   [cn=doh,cn=test]
> (<cn=barAlias,cn=foo,cn=test>, <cn=dohAlias,cn=foo,cn=test>) : second 
> alias met
> -> there are no further entries. We are done
>
> Seems to work... Did I miss something ?
Yes, I missed something (thanks Pierre-Arnaud for pointing this out) :

if the alias get dereferenced to an entry below one of the DN already 
stored in the stack, or to the descendant of one of the stored DNs, then 
we don't need to proceed the search for this alias, as the entries have 
already been provided. That will avoid sending duplicate entries to the 
user.



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com