You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "angela (JIRA)" <ji...@apache.org> on 2016/05/13 08:04:13 UTC

[jira] [Comment Edited] (OAK-4119) Improvements Take 1

    [ https://issues.apache.org/jira/browse/OAK-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194887#comment-15194887 ] 

angela edited comment on OAK-4119 at 5/13/16 8:03 AM:
------------------------------------------------------

h3. Benchmark Results for Patch OAK-4119_2 | OAK-4119_3
assuming that the benchmarks are suited to reflect a realistic scenario, the preliminary benchmark results seem to indicate the following effects of the proposed improvements.
Note that some benchmarks were run with patch 3 instead of 2 (marked accordingly in the attached results.

h5. Group.isMember
- usage of membership query positive for groups with lots of members
- for very small groups the query comes with some overhead
_Note_: while looking at the profiling and the query-code in {{IdentifierManager.getReferences}} again, I found the getReferences code overly complex and unnecessarily resolving the path to a tree and testing for node-type inheritance and testing for value type and matching against the searched uuid => will use a simplified, readable variant in patch OAK-4119_3.
_Note_: the profiling information for {{IsMemberTest}} counted around 10% in _org.apache.jackrabbit.oak.commons.jmx_ which i found really surprising given the fact that the code just executes queries. extract from benchmark:

{code}
at com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:158)
at com.sun.jmx.mbeanserver.MBeanSupport.<init>(MBeanSupport.java:149)
at com.sun.jmx.mbeanserver.StandardMBeanSupport.<init>(StandardMBeanSupport.java:81)
at javax.management.StandardMBean.construct(StandardMBean.java:169)
at javax.management.StandardMBean.<init>(StandardMBean.java:286)
at org.apache.jackrabbit.oak.commons.jmx.AnnotatedStandardMBean.<init>(AnnotatedStandardMBean.java:49)
at org.apache.jackrabbit.oak.query.QueryEngineSettings.<init>(QueryEngineSettings.java:54)
at org.apache.jackrabbit.oak.query.index.FilterImpl.<init>(FilterImpl.java:115)
at org.apache.jackrabbit.oak.plugins.index.reference.ReferenceIndex.lookup(ReferenceIndex.java:120)
at org.apache.jackrabbit.oak.plugins.index.reference.ReferenceIndex.query(ReferenceIndex.java:107)
at org.apache.jackrabbit.oak.query.ast.SelectorImpl.execute(SelectorImpl.java:331)
at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:816)
at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:845)
{code}

h5. Group.isDeclaredMember
- it seems that the query optimization might only justified for groups with really huge number of members. the original threshold of 10 child nodes (i.e. 1000 members) is most likely too small; also the results from the slightly different benchmarks {{MemberIsDeclardMemberTest}} and {{IsDeclaredMemberTest}} leaves some doubts => additional bench-runs needed to identify an optimized  value and verify if the proposed changes are really justified.

h5. Group.addMember
- IMO no significant improvement for adding users (but see next point for real-batch mode of adding members)
- _Note_: the benchmarks related to adding members currently only add users a new members => therefore the removal of the duplicate circular check is not reflected in the bench-results.

h5. Group.addMembers
- it seems the proposed modification have a positive effect; specially for huge groups and batched adding. IMO that's not too surprising as the logic didn't changed but some overhead is omitted.
- _Again_: the benchmarks related to adding members currently only add users a new members => therefore the removal of the duplicate circular-group-in-group check is not reflected in the bench-results; that is expected have an additional positive effect specially in combination with {{Group.isMember}} improvements, which is called during {{UserValidator.checkForCyclicMembership}}

h5. Group.removeMember
=> see below

h5. Group.removeMembers
=> see below


was (Author: anchela):
h3. Benchmark Results for Patch OAK-4119_2 | OAK-4119_3
assuming that the benchmarks are suited to reflect a realistic scenario, the preliminary benchmark results seem to indicate the following effects of the proposed improvements.
Note that some benchmarks were run with patch 3 instead of 2 (marked accordingly in the attached results.

h5. Group.isMember
- usage of membership query positive for groups with lots of members
- for very small groups the query comes with some overhead
_Note_: while looking at the profiling and the query-code in {{IdentifierManager.getReferences}} again, I found the getReferences code overly complex and unnecessarily resolving the path to a tree and testing for node-type inheritance and testing for value type and matching against the searched uuid => will use a simplified, readable variant in patch OAK-4119_3.
_Note_: the profiling information for {{IsMemberTest}} counted around 10% in _org.apache.jackrabbit.oak.commons.jmx_ which i found really surprising given the fact that the code just executes queries. extract from benchmark:

{code}
at com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:158)
at com.sun.jmx.mbeanserver.MBeanSupport.<init>(MBeanSupport.java:149)
at com.sun.jmx.mbeanserver.StandardMBeanSupport.<init>(StandardMBeanSupport.java:81)
at javax.management.StandardMBean.construct(StandardMBean.java:169)
at javax.management.StandardMBean.<init>(StandardMBean.java:286)
at org.apache.jackrabbit.oak.commons.jmx.AnnotatedStandardMBean.<init>(AnnotatedStandardMBean.java:49)
at org.apache.jackrabbit.oak.query.QueryEngineSettings.<init>(QueryEngineSettings.java:54)
at org.apache.jackrabbit.oak.query.index.FilterImpl.<init>(FilterImpl.java:115)
at org.apache.jackrabbit.oak.plugins.index.reference.ReferenceIndex.lookup(ReferenceIndex.java:120)
at org.apache.jackrabbit.oak.plugins.index.reference.ReferenceIndex.query(ReferenceIndex.java:107)
at org.apache.jackrabbit.oak.query.ast.SelectorImpl.execute(SelectorImpl.java:331)
at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:816)
at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:845)
{code}

h5. Group.isDeclaredMember
- it seems that the query optimization might only justified for groups with really huge number of members. the original threshold of 10 child nodes (i.e. 1000 members) is most likely too small; also the results from the slightly different benchmarks {{MemberIsDeclardMemberTest}} and {{IsDeclaredMemberTest}} leaves some doubts => additional bench-runs needed to identify an optimized  value and verify if the proposed changes are really justified.

h5. Group.addMember
- IMO no significant improvement for adding users (but see next point for real-batch mode of adding members)
- _Note_: the benchmarks related to adding members currently only add users a new members => therefore the removal of the duplicate circular check is not reflected in the bench-results.

h5. Group.addMembers
- it seems the proposed modification have a positive effect; specially for huge groups and batched adding. IMO that's not too surprising as the logic didn't changed but some overhead is omitted.
- _Again_: the benchmarks related to adding members currently only add users a new members => therefore the removal of the duplicate circular-group-in-group check is not reflected in the bench-results; that is expected have an additional positive effect specially in combination with {{Group.isMember}} improvements, which is called during {{UserValidator.checkForCyclicMembership}}

h5. Group.removeMember
_TODO_

h5. Group.removeMembers
_TODO_

> Improvements Take 1
> -------------------
>
>                 Key: OAK-4119
>                 URL: https://issues.apache.org/jira/browse/OAK-4119
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: angela
>            Assignee: angela
>             Fix For: 1.5.2
>
>         Attachments: AddMemberTest_20160314_152515.csv, AddMemberTest_with_batch_20160315_150518.csv, AddMembersTest_besteffort_20160314_163025.csv, IsDeclaredMemberTest_notNested_profiling_20160314_172728.csv, IsMemberTest_nested_profiling_20160315_134252.csv, IsMemberTest_notNested_profiling_20160315_115809.csv, MemberDeclaredMemberOf_nested_profiling_20160314_111037.csv, MemberDeclaredMemberOf_notNested_profiling_20160314_103003.csv, MemberIsDeclaredMember_notNested_profiling_20160314_115026.csv, MemberIsMember_nested_profiling_20160314_142612.csv, MemberIsMember_notNested_profiling_20160314_123150.csv, MemberMemberOf_nested_profiling_20160314_085505.csv, MemberMemberOf_notNested_profiling_20160314_094452.csv, OAK-4119.patch, OAK-4119_2.patch, OAK-4119_3.patch, OAK-4119_tests.patch, RemoveMemberTest_with_batch_20160512_161848.csv, RemoveMembersTest_besteffort_20160512_122352.csv
>
>
> First bunch of potential improvements to be tested/verified using the benchmarks:
> h4. General
> - dedicated index for {{rep:members}} property defined by nt {{rep:MemberReferences}}  (=> mod in {{UserInitializer}})
> - -adjust nt-handling in {{IdentifierManager.getReferences}} : if a single nt name is specified in the {{nodeTypeNames}} param it is inserted in the query statement instead of looking for {{nt:base}}-
> - simplified, new version of {{IdentifierManager.getReferences}} to match needs of membership lookup where propName and a single nt-name is given. see below (in OAK-4119_3.patch)
> - introduce {{hasAnyReferenceInPath}}, which allows to search for any references being located in a give subtree of the repository (escaping fixed in OAK-4119_3.patch)
> - {{MembershipProvider.MemberReferenceIterator}}: keeping track of processed references is optional i.e. an implementation detail
> - {{MembershipProvider.AbstractMemberIterator}}: no more protected fields (patch OAK-4119_2)
> - {{MembershipProvider.AbstractMemberIterator}}: breadth-first iteration before performing queries for the inherited members/membership (patch OAK-4119_2)
> h4. Group.isMember
> - {{MembershipProvider}}: introduce shortcut if a Group doesn't have any members
> -  {{MembershipProvider}}: keep old logic if the target Group has (potentially) pending changes 
> -  {{MembershipProvider}}: for unmodified Groups use reverse membership lookup of the target authorizable instead (patch OAK-4119_2: further optimizes 'hasMembership' as the {{AbstractMemberIterator}} doesn't need to search for the complete membership; limiting the number of queries to be executed. see above)
> - {{GroupImpl}}: add shortcut if member to be tested is a Group representing the {{EveryonePrincipal}}
> h4. Group.isDeclaredMember
> - introduce shortcut if a Group doesn't have any members
> - keep old logic if target Group (potentially) has pending changes or if the number of members is small (=> threshold with {{MembershipProvider}})
> - minor change with the old behavior, which does no longer keep track of processed references, which is not needed here
> - for unmodified Groups with plenty of members use {{IdentifierManager.hasAnyReferenceInPath}}
> - All modifications in {{MembershipProvider}}
> h4. Group.addMember(Authorizable)
> - {{GroupImpl}}: drop extra test for circular membership, which is anyway enforced by the {{UserValidator}} => needs adjustment of test-cases that expect this to be detected immediately => intoducing save-call to verify if cycles are properly detected latest during commit.
> - {{UserValidator}}: if {{isMember}} is improved by the changes above, the check for circular membership would be faster as well
> h4. Group.addMembers(String...)
> - introduce new methods on {{MembershipProvider}} and {{MembershipWriter}} that don't take a single contentID but rather a Map of contentID:memberID and returns the set of memberIDs that could not be processed successfully. 
> - consequently finding the best {{rep:members property}} and avoiding duplicate membership is performed only once for the batch of ids to be added
> - Note: once best-property reaches threshold new member-ref nodes will be created
> h4. Group.removeMembers(String...)
> - introduce new methods on {{MembershipProvider}} and {{MembershipWriter}} that don't take a single contentID but rather a Map of contentID:memberID and returns the set of memberIDs that could not be processed successfully.  
> - consequently for a given batch of ids the declared member are traversed only once



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)