You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@syncope.apache.org by Timo Hatakka <ti...@helsinki.fi> on 2013/11/14 11:27:37 UTC

Timestamp based synchronization

Hi,

we have tried to use event / timestamp based synchronization from an
external user database to repository. It seems that Syncope calls sync and
getLastSyncToken connector framework operations in a following manner when
time goes on:

t1.1 sync, token = null
t1.2 getLastSyncToken
t2.1 sync, token = value got from t1.2
t2.2 getLastSyncToken
t3.1 sync, token = value got from t2.2
t3.2 getLastSyncToken

in ti.j i means task execution iteration number and j means called ICF
operation number in current task execution.

Why getLastSyncToken is used at all? Could it just continue from the time
which was processed in an earlier run, or use getLastSyncToken to test if
there is something newer and then call sync only if something is available? 

There seems to be quite a serious problem in the current implementation.
Let's suppose that we are going to process step t2.1 and there are three new
events e1, e2 and e3 available. And let us further suppose that two new
items e4 and e5 will happen during t2.1. In t2.2 token will be set to e5
which means that in the step t3.1 we will continue from e5 and e4 will be
missed!

We have tested this with database table connector and scripted sql connector
and the problem exists in both implementations.

Thanks!

Timo Hatakka

Re: Timestamp based synchronization

Posted by Timo-V Hatakka <ti...@helsinki.fi>.

Hi!

>>>>
>>>> we have tried to use event / timestamp based synchronization from  
>>>> an external user database to repository. It seems that Syncope  
>>>> calls sync and getLastSyncToken connector framework operations in  
>>>> a following manner when time goes on:
>>>>
>>>> t1.1 sync, token = null
>>>> t1.2 getLastSyncToken
>>>> t2.1 sync, token = value got from t1.2
>>>> t2.2 getLastSyncToken
>>>> t3.1 sync, token = value got from t2.2
>>>> t3.2 getLastSyncToken
>>>>
>>>> in ti.j i means task execution iteration number and j means  
>>>> called ICF operation number in current task execution.
>>>>
>>>> Why getLastSyncToken is used at all? Could it just continue from  
>>>> the time which was processed in an earlier run, or use  
>>>> getLastSyncToken to test if there is something newer and then  
>>>> call sync only if something is available?
>>>>
>>>> There seems to be quite a serious problem in the current  
>>>> implementation. Let's suppose that we are going to process step  
>>>> t2.1 and there are three new events e1, e2 and e3 available. And  
>>>> let us further suppose that two new items e4 and e5 will happen  
>>>> during t2.1. In t2.2 token will be set to e5 which means that in  
>>>> the step t3.1 we will continue from e5 and e4 will be missed!
>>>>
>>>> We have tested this with database table connector and scripted  
>>>> sql connector and the problem exists in both implementations.
>>>>
>>> Hi Timo, yes, you are right, there is a big problem.
>>> I'm quite sure it wasn't in the past: order of calls should be inverted.
>>>
>>> I mean:
>>>
>>> 1. get new sync token
>>> 2. execute sync with the oldest one
>>> 3. replace the old one whit the new one
>>>
>>> In this case we risk to perform twise the same operation but, at  
>>> least, it shouldn be a problem.
>>>
>>> Does it make sense?
>>
>> Yes. I suppose that this depens on the comparison opetaror used,  
>> ">" or ">=". DB seems to use ">" and scripted SQL examples ">=".
>>
>>> Why don't you open a new issue for this problem? The fix will be  
>>> released into the 1.1.5 and 1.2.0.
>>
>> I will!
>
> Hi,
> I've seen you've opened SYNCOPE-440, thanks!
>
> Would you be able to provide a patch? The change should be likely  
> located in SyncJob class [1].
> You can find some guidelines about how to submit a patch [2].

Sorry but we don't have any possibility to patch it just now. We are  
doing quite deep POC testing, which should be finnished in a couple of  
weeks. After that we will decide whether we start a pilot project. If  
this is still open we could try to fix it then. However we are not  
very familiar with internal desing of syncope, so it could be much  
easier for you!

Regards,
Timo

Re: Timestamp based synchronization

Posted by Francesco Chicchiriccò <il...@apache.org>.

On 15/11/2013 07:22, Timo-V Hatakka wrote:
> Hi,
>
>> Il 14/11/2013 11:27, Timo Hatakka ha scritto:
>>>
>>> Hi,
>>>
>>> we have tried to use event / timestamp based synchronization from an 
>>> external user database to repository. It seems that Syncope calls 
>>> sync and getLastSyncToken connector framework operations in a 
>>> following manner when time goes on:
>>>
>>> t1.1 sync, token = null
>>> t1.2 getLastSyncToken
>>> t2.1 sync, token = value got from t1.2
>>> t2.2 getLastSyncToken
>>> t3.1 sync, token = value got from t2.2
>>> t3.2 getLastSyncToken
>>>
>>> in ti.j i means task execution iteration number and j means called 
>>> ICF operation number in current task execution.
>>>
>>> Why getLastSyncToken is used at all? Could it just continue from the 
>>> time which was processed in an earlier run, or use getLastSyncToken 
>>> to test if there is something newer and then call sync only if 
>>> something is available?
>>>
>>> There seems to be quite a serious problem in the current 
>>> implementation. Let's suppose that we are going to process step t2.1 
>>> and there are three new events e1, e2 and e3 available. And let us 
>>> further suppose that two new items e4 and e5 will happen during 
>>> t2.1. In t2.2 token will be set to e5 which means that in the step 
>>> t3.1 we will continue from e5 and e4 will be missed!
>>>
>>> We have tested this with database table connector and scripted sql 
>>> connector and the problem exists in both implementations.
>>>
>> Hi Timo, yes, you are right, there is a big problem.
>> I'm quite sure it wasn't in the past: order of calls should be inverted.
>>
>> I mean:
>>
>> 1. get new sync token
>> 2. execute sync with the oldest one
>> 3. replace the old one whit the new one
>>
>> In this case we risk to perform twise the same operation but, at 
>> least, it shouldn be a problem.
>>
>> Does it make sense?
>
> Yes. I suppose that this depens on the comparison opetaror used, ">" 
> or ">=". DB seems to use ">" and scripted SQL examples ">=".
>
>> Why don't you open a new issue for this problem? The fix will be 
>> released into the 1.1.5 and 1.2.0.
>
> I will!

Hi,
I've seen you've opened SYNCOPE-440, thanks!

Would you be able to provide a patch? The change should be likely 
located in SyncJob class [1].
You can find some guidelines about how to submit a patch [2].

Regards.

[1] 
https://svn.apache.org/repos/asf/syncope/branches/1_1_X/core/src/main/java/org/apache/syncope/core/sync/impl/SyncJob.java
[2] http://syncope.apache.org/contributing.html#Code

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

ASF Member, Apache Syncope PMC chair, Apache Cocoon PMC Member
http://people.apache.org/~ilgrosso/

Re: Timestamp based synchronization

Posted by Timo-V Hatakka <ti...@helsinki.fi>.

Hi,

> Il 14/11/2013 11:27, Timo Hatakka ha scritto:
>>
>> Hi,
>>
>> we have tried to use event / timestamp based synchronization from  
>> an external user database to repository. It seems that Syncope  
>> calls sync and getLastSyncToken connector framework operations in a  
>> following manner when time goes on:
>>
>> t1.1 sync, token = null
>> t1.2 getLastSyncToken
>> t2.1 sync, token = value got from t1.2
>> t2.2 getLastSyncToken
>> t3.1 sync, token = value got from t2.2
>> t3.2 getLastSyncToken
>>
>> in ti.j i means task execution iteration number and j means called  
>> ICF operation number in current task execution.
>>
>> Why getLastSyncToken is used at all? Could it just continue from  
>> the time which was processed in an earlier run, or use  
>> getLastSyncToken to test if there is something newer and then call  
>> sync only if something is available?
>>
>> There seems to be quite a serious problem in the current  
>> implementation. Let's suppose that we are going to process step  
>> t2.1 and there are three new events e1, e2 and e3 available. And  
>> let us further suppose that two new items e4 and e5 will happen  
>> during t2.1. In t2.2 token will be set to e5 which means that in  
>> the step t3.1 we will continue from e5 and e4 will be missed!
>>
>> We have tested this with database table connector and scripted sql  
>> connector and the problem exists in both implementations.
>>
> Hi Timo, yes, you are right, there is a big problem.
> I'm quite sure it wasn't in the past: order of calls should be inverted.
>
> I mean:
>
> 1. get new sync token
> 2. execute sync with the oldest one
> 3. replace the old one whit the new one
>
> In this case we risk to perform twise the same operation but, at  
> least, it shouldn be a problem.
>
> Does it make sense?

Yes. I suppose that this depens on the comparison opetaror used, ">"  
or ">=". DB seems to use ">" and scripted SQL examples ">=".

> Why don't you open a new issue for this problem? The fix will be  
> released into the 1.1.5 and 1.2.0.

I will!

Best regards,

Timo

>
> Best regards,
> F.
>
>
>
>> Thanks!
>>
>> Timo Hatakka
>>
>
>
> -- 
> Fabio Martelli
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> Apache Syncope PMC
> http://people.apache.org/~fmartelli/

Re: Timestamp based synchronization

Posted by Fabio Martelli <fa...@gmail.com>.

Il 14/11/2013 11:27, Timo Hatakka ha scritto:
>
> Hi,
>
> we have tried to use event / timestamp based synchronization from an 
> external user database to repository. It seems that Syncope calls sync 
> and getLastSyncToken connector framework operations in a following 
> manner when time goes on:
>
> t1.1 sync, token = null
> t1.2 getLastSyncToken
> t2.1 sync, token = value got from t1.2
> t2.2 getLastSyncToken
> t3.1 sync, token = value got from t2.2
> t3.2 getLastSyncToken
>
> in ti.j i means task execution iteration number and j means called ICF 
> operation number in current task execution.
>
> Why getLastSyncToken is used at all? Could it just continue from the 
> time which was processed in an earlier run, or use getLastSyncToken to 
> test if there is something newer and then call sync only if something 
> is available?
>
> There seems to be quite a serious problem in the current 
> implementation. Let's suppose that we are going to process step t2.1 
> and there are three new events e1, e2 and e3 available. And let us 
> further suppose that two new items e4 and e5 will happen during t2.1. 
> In t2.2 token will be set to e5 which means that in the step t3.1 we 
> will continue from e5 and e4 will be missed!
>
> We have tested this with database table connector and scripted sql 
> connector and the problem exists in both implementations.
>
Hi Timo, yes, you are right, there is a big problem.
I'm quite sure it wasn't in the past: order of calls should be inverted.

I mean:

1. get new sync token
2. execute sync with the oldest one
3. replace the old one whit the new one

In this case we risk to perform twise the same operation but, at least, 
it shouldn be a problem.

Does it make sense?
Why don't you open a new issue for this problem? The fix will be 
released into the 1.1.5 and 1.2.0.

Best regards,
F.



> Thanks!
>
> Timo Hatakka
>


-- 
Fabio Martelli

Tirasa - Open Source Excellence
http://www.tirasa.net/

Apache Syncope PMC
http://people.apache.org/~fmartelli/