You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2016/01/28 06:43:39 UTC

[jira] [Commented] (ACCUMULO-4066) Conditional mutation processing performance could be improved.

    [ https://issues.apache.org/jira/browse/ACCUMULO-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120803#comment-15120803 ] 

Josh Elser commented on ACCUMULO-4066:
--------------------------------------

This one slipped in under my radar. Thought I'd give your changes a glance. 3x speed up is awesome!

{code}
-    for (Condition cond : cm.getConditions()) {
+    // sort conditions inorder to get better lookup performance. Sort on client side so tserver does not have to do it.
+    Condition[] ca = cm.getConditions().toArray(new Condition[cm.getConditions().size()]);
+    Arrays.sort(ca, CONDITION_COMPARATOR);
{code}

To confirm, the server doesnt' rely on the sorted order, just hopes for it for performance reasons?

I see a lot of changes in IteratorUtil (I assume to your point about loading iterators from the table config). How did this used to work? You had lots of new tests added for the other cases -- do we have good coverage for IteratorUtil already?

> Conditional mutation processing performance could be improved.
> --------------------------------------------------------------
>
>                 Key: ACCUMULO-4066
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4066
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>    Affects Versions: 1.6.4, 1.7.0
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.6.5, 1.7.1, 1.8.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When processing conditional mutations tablets reads are done.   The way the current implementation does tablet reads has a lot of overhead.   For each condition the following is done :
>  * Opens and reserves iterators files.
>  * Parse table iterators from table config (involves scanning and filtering entire table config)
>  * Merges condition iterators and table iterators
>  * Constructs iterator stack.
> I created a branch where these operations (except for constructing iterator stack) are done per tablet and/or per batch of conditional mutations.   Doing this I am seeing a 3x speed up in conditional mutation processing rates when data is cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)