You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@syncope.apache.org by Francesco Chicchiriccò <il...@apache.org> on 2013/11/15 08:24:57 UTC

Re: Strange SQL operation

On 15/11/2013 08:08, Timo Hatakka wrote:
>
> Hi,
>
> we made some SQL level investigations about time consumption of 
> synchronization task processes. We started with an empty repository 
> and processed the same external user table (of 5 000 users) twice. 
> First run was processed more quickly and the time used for SQL 
> operations was divided for handling of ACT_GE_BYTEARRAY, UAttrValue, 
> ACT_HI_ACTINST, Policy, SyncopeUser, UAttr and USchema tables. The 
> next run was 150%-200% slower and DB operations took about 50% of the 
> time. We noticed that almost 75% of DB operations was used in SELECT 
> query to table ACT_RU_TASK. The exact SQL clause was "select * from 
> ACT_RU_TASK where PARENT_TASK_ID_ = 'XXXXX'". There is no index for 
> PARENT_TASK_ID_ column and in our runs the column value is always 
> null. Is this some kind of bug as it makes updates very slow?
>

Hi Timo,
thanks for your SQL analysis.
BTW, which DBMS are you using? If MySQL, with InnoDB as default engine 
or not?

The table mentioned above involved in the critical queries (ACT_RU_TASK) 
is actually an Activiti ([1] the default user workflow engine) table, 
not under Syncope direct control: you can try to manually define some 
indexes and check if there is any significant improvement.

Alternatively, we can see if usage of API call(s) triggering such query 
can be lowered from Syncope code - mainly the 
ActivitiUserWorkflowAdapter [2], but I'm dubious.

Regards.

[1] http://www.activiti.org
[2] 
http://svn.apache.org/repos/asf/syncope/branches/1_1_X/core/src/main/java/org/apache/syncope/core/workflow/user/activiti/ActivitiUserWorkflowAdapter.java

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

ASF Member, Apache Syncope PMC chair, Apache Cocoon PMC Member
http://people.apache.org/~ilgrosso/


Re: Strange SQL operation

Posted by Francesco Chicchiriccò <il...@apache.org>.
On 18/11/2013 13:49, Timo-V Hatakka wrote:
>> On 15/11/2013 09:32, Timo-V Hatakka wrote:
>>> Hi!
>>>
>>>>> we made some SQL level investigations about time consumption of 
>>>>> synchronization task processes. We started with an empty 
>>>>> repository and processed the same external user table (of 5 000 
>>>>> users) twice. First run was processed more quickly and the time 
>>>>> used for SQL operations was divided for handling of 
>>>>> ACT_GE_BYTEARRAY, UAttrValue, ACT_HI_ACTINST, Policy, SyncopeUser, 
>>>>> UAttr and USchema tables. The next run was 150%-200% slower and DB 
>>>>> operations took about 50% of the time. We noticed that almost 75% 
>>>>> of DB operations was used in SELECT query to table ACT_RU_TASK. 
>>>>> The exact SQL clause was "select * from ACT_RU_TASK where 
>>>>> PARENT_TASK_ID_ = 'XXXXX'". There is no index for PARENT_TASK_ID_ 
>>>>> column and in our runs the column value is always null. Is this 
>>>>> some kind of bug as it makes updates very slow?
>>>>>
>>>>
>>>> Hi Timo,
>>>> thanks for your SQL analysis.
>>>> BTW, which DBMS are you using? If MySQL, with InnoDB as default 
>>>> engine or not?
>>>
>>> MySQL with InnoDB.
>>
>> Yeah, I'd figured: if you have time, give a try to PostgreSQL; it is 
>> *really* open source and works much better IMHO.
>> Anyway...
>>
>>>> The table mentioned above involved in the critical queries 
>>>> (ACT_RU_TASK) is actually an Activiti ([1] the default user 
>>>> workflow engine) table, not under Syncope direct control: you can 
>>>> try to manually define some indexes and check if there is any 
>>>> significant improvement.
>>>>
>>>> Alternatively, we can see if usage of API call(s) triggering such 
>>>> query can be lowered from Syncope code - mainly the 
>>>> ActivitiUserWorkflowAdapter [2], but I'm dubious.
>>>
>>> I am aware that this can also be an activiti issue. But I think that 
>>> it is a little bit strange that there is a given PARENT_TASK_ID_ in 
>>> the query. Is this hierarchy maintained purely in activiti? No 
>>> possibility that you are calling some activiti operations with wrong 
>>> parameters? If you think so, I could define a new index and repeat 
>>> the test.
>>
>> I don't think there is any explicit call in the code ([2], please 
>> don't remove links when you reply or text might be left without 
>> references, as happening above) where something related to 
>> PARENT_TASK_ID is set.
>> I might be wrong, of course, so please take a look at the source code.
>>
>> As said above, I still think that adding new indexes (or any other 
>> simple tuning task) affecting Activiti tables is the first thing to 
>> attempt.
>
> I did it: "CREATE INDEX RU_TEST ON ACT_RU_TASK (PARENT_TASK_ID_)". It 
> helped!
> The task was processed much faster and now ACT_RU_TASK queries took 
> about 13% of the time (75% earlier).
>
> Thank you very much!

You're welcome: I think we need to share somehow your findings, so I 
have opened SYNCOPE-441 [3] in order to create such index by default at 
every deployment.

Thanks for reporting.

Regards.

>> [1] http://www.activiti.org
>> [2] 
>> http://svn.apache.org/repos/asf/syncope/branches/1_1_X/core/src/main/java/org/apache/syncope/core/workflow/user/activiti/ActivitiUserWorkflowAdapter.java
[3] https://issues.apache.org/jira/browse/SYNCOPE-441

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

ASF Member, Apache Syncope PMC chair, Apache Cocoon PMC Member
http://people.apache.org/~ilgrosso/


Re: Strange SQL operation

Posted by Timo-V Hatakka <ti...@helsinki.fi>.
> On 15/11/2013 09:32, Timo-V Hatakka wrote:
>> Hi!
>>
>>>> we made some SQL level investigations about time consumption of  
>>>> synchronization task processes. We started with an empty  
>>>> repository and processed the same external user table (of 5 000  
>>>> users) twice. First run was processed more quickly and the time  
>>>> used for SQL operations was divided for handling of  
>>>> ACT_GE_BYTEARRAY, UAttrValue, ACT_HI_ACTINST, Policy,  
>>>> SyncopeUser, UAttr and USchema tables. The next run was 150%-200%  
>>>> slower and DB operations took about 50% of the time. We noticed  
>>>> that almost 75% of DB operations was used in SELECT query to  
>>>> table ACT_RU_TASK. The exact SQL clause was "select * from  
>>>> ACT_RU_TASK where PARENT_TASK_ID_ = 'XXXXX'". There is no index  
>>>> for PARENT_TASK_ID_ column and in our runs the column value is  
>>>> always null. Is this some kind of bug as it makes updates very  
>>>> slow?
>>>>
>>>
>>> Hi Timo,
>>> thanks for your SQL analysis.
>>> BTW, which DBMS are you using? If MySQL, with InnoDB as default  
>>> engine or not?
>>
>> MySQL with InnoDB.
>
> Yeah, I'd figured: if you have time, give a try to PostgreSQL; it is  
> *really* open source and works much better IMHO.
> Anyway...
>
>>> The table mentioned above involved in the critical queries  
>>> (ACT_RU_TASK) is actually an Activiti ([1] the default user  
>>> workflow engine) table, not under Syncope direct control: you can  
>>> try to manually define some indexes and check if there is any  
>>> significant improvement.
>>>
>>> Alternatively, we can see if usage of API call(s) triggering such  
>>> query can be lowered from Syncope code - mainly the  
>>> ActivitiUserWorkflowAdapter [2], but I'm dubious.
>>
>> I am aware that this can also be an activiti issue. But I think  
>> that it is a little bit strange that there is a given  
>> PARENT_TASK_ID_ in the query. Is this hierarchy maintained purely  
>> in activiti? No possibility that you are calling some activiti  
>> operations with wrong parameters? If you think so, I could define a  
>> new index and repeat the test.
>
> I don't think there is any explicit call in the code ([2], please  
> don't remove links when you reply or text might be left without  
> references, as happening above) where something related to  
> PARENT_TASK_ID is set.
> I might be wrong, of course, so please take a look at the source code.
>
> As said above, I still think that adding new indexes (or any other  
> simple tuning task) affecting Activiti tables is the first thing to  
> attempt.

I did it: "CREATE INDEX RU_TEST ON ACT_RU_TASK (PARENT_TASK_ID_)". It helped!
The task was processed much faster and now ACT_RU_TASK queries took  
about 13% of the time (75% earlier).

Thank you very much!

Timo

>
> Regards.
>
> [1] http://www.activiti.org
> [2]  
> http://svn.apache.org/repos/asf/syncope/branches/1_1_X/core/src/main/java/org/apache/syncope/core/workflow/user/activiti/ActivitiUserWorkflowAdapter.java
>
> -- 
> Francesco Chicchiriccò
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> ASF Member, Apache Syncope PMC chair, Apache Cocoon PMC Member
> http://people.apache.org/~ilgrosso/




Re: Strange SQL operation

Posted by Francesco Chicchiriccò <il...@apache.org>.
On 15/11/2013 09:32, Timo-V Hatakka wrote:
> Hi!
>
>>> we made some SQL level investigations about time consumption of 
>>> synchronization task processes. We started with an empty repository 
>>> and processed the same external user table (of 5 000 users) twice. 
>>> First run was processed more quickly and the time used for SQL 
>>> operations was divided for handling of ACT_GE_BYTEARRAY, UAttrValue, 
>>> ACT_HI_ACTINST, Policy, SyncopeUser, UAttr and USchema tables. The 
>>> next run was 150%-200% slower and DB operations took about 50% of 
>>> the time. We noticed that almost 75% of DB operations was used in 
>>> SELECT query to table ACT_RU_TASK. The exact SQL clause was "select 
>>> * from ACT_RU_TASK where PARENT_TASK_ID_ = 'XXXXX'". There is no 
>>> index for PARENT_TASK_ID_ column and in our runs the column value is 
>>> always null. Is this some kind of bug as it makes updates very slow?
>>>
>>
>> Hi Timo,
>> thanks for your SQL analysis.
>> BTW, which DBMS are you using? If MySQL, with InnoDB as default 
>> engine or not?
>
> MySQL with InnoDB.

Yeah, I'd figured: if you have time, give a try to PostgreSQL; it is 
*really* open source and works much better IMHO.
Anyway...

>> The table mentioned above involved in the critical queries 
>> (ACT_RU_TASK) is actually an Activiti ([1] the default user workflow 
>> engine) table, not under Syncope direct control: you can try to 
>> manually define some indexes and check if there is any significant 
>> improvement.
>>
>> Alternatively, we can see if usage of API call(s) triggering such 
>> query can be lowered from Syncope code - mainly the 
>> ActivitiUserWorkflowAdapter [2], but I'm dubious.
>
> I am aware that this can also be an activiti issue. But I think that 
> it is a little bit strange that there is a given PARENT_TASK_ID_ in 
> the query. Is this hierarchy maintained purely in activiti? No 
> possibility that you are calling some activiti operations with wrong 
> parameters? If you think so, I could define a new index and repeat the 
> test.

I don't think there is any explicit call in the code ([2], please don't 
remove links when you reply or text might be left without references, as 
happening above) where something related to PARENT_TASK_ID is set.
I might be wrong, of course, so please take a look at the source code.

As said above, I still think that adding new indexes (or any other 
simple tuning task) affecting Activiti tables is the first thing to attempt.

Regards.

[1] http://www.activiti.org
[2] 
http://svn.apache.org/repos/asf/syncope/branches/1_1_X/core/src/main/java/org/apache/syncope/core/workflow/user/activiti/ActivitiUserWorkflowAdapter.java

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

ASF Member, Apache Syncope PMC chair, Apache Cocoon PMC Member
http://people.apache.org/~ilgrosso/


Re: Strange SQL operation

Posted by Timo-V Hatakka <ti...@helsinki.fi>.
Hi!

>> we made some SQL level investigations about time consumption of  
>> synchronization task processes. We started with an empty repository  
>> and processed the same external user table (of 5 000 users) twice.  
>> First run was processed more quickly and the time used for SQL  
>> operations was divided for handling of ACT_GE_BYTEARRAY,  
>> UAttrValue, ACT_HI_ACTINST, Policy, SyncopeUser, UAttr and USchema  
>> tables. The next run was 150%-200% slower and DB operations took  
>> about 50% of the time. We noticed that almost 75% of DB operations  
>> was used in SELECT query to table ACT_RU_TASK. The exact SQL clause  
>> was "select * from ACT_RU_TASK where PARENT_TASK_ID_ = 'XXXXX'".  
>> There is no index for PARENT_TASK_ID_ column and in our runs the  
>> column value is always null. Is this some kind of bug as it makes  
>> updates very slow?
>>
>
> Hi Timo,
> thanks for your SQL analysis.
> BTW, which DBMS are you using? If MySQL, with InnoDB as default  
> engine or not?

MySQL with InnoDB.

> The table mentioned above involved in the critical queries  
> (ACT_RU_TASK) is actually an Activiti ([1] the default user workflow  
> engine) table, not under Syncope direct control: you can try to  
> manually define some indexes and check if there is any significant  
> improvement.
>
> Alternatively, we can see if usage of API call(s) triggering such  
> query can be lowered from Syncope code - mainly the  
> ActivitiUserWorkflowAdapter [2], but I'm dubious.

I am aware that this can also be an activiti issue. But I think that  
it is a little bit strange that there is a given PARENT_TASK_ID_ in  
the query. Is this hierarchy maintained purely in activiti? No  
possibility that you are calling some activiti operations with wrong  
parameters? If you think so, I could define a new index and repeat the  
test.

Regards,
Timo