You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "T Jake Luciani (JIRA)" <ji...@apache.org> on 2010/07/11 17:59:49 UTC

[jira] Created: (CASSANDRA-1267) Improve performance of cached row slices

Improve performance of cached row slices
----------------------------------------

                 Key: CASSANDRA-1267
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: T Jake Luciani
            Priority: Minor
             Fix For: 0.7


In Lucandra, I have a use case to pull all columns for a given row.  

I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.

After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
This patch skips this use case and in my case has improved performance significantly.

>From ~400ms to ~50ms



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1267:
--------------------------------------

    Attachment: 1267-v2.txt

Good start.  Here's my take on it:

 1) the business about counting only live columns is not optional; it breaks tests to take it out
 2) we can optimize removeDeleted more by not adding irrelevant columns in the first place
 3) we can simplify things by making callers who don't want the source CF modified explicitly clone first.  (this is just test code.)
 4) gcBefore is strictly increasing in real code, so directly modifying the cached row during removeDeleted is OK; test code can work around w/ (3) above

v2 attached.  [(2) is done by splitting rD into rDCF and rDColumnsOnly.  full rD is only needed for supercolumn rows, since "don't add irrelevant subcolumns" is hard.]

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 1267-v2.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909561#action_12909561 ] 

T Jake Luciani commented on CASSANDRA-1267:
-------------------------------------------

FYI, This didn't make it into 0.7-beta1 somehow

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7 beta 1
>
>         Attachments: 1267-v2.txt, 1267-v3.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1267:
--------------------------------------

    Attachment: 1267-v3.txt

bq. I thought the logic was redundant QueryFilter.isRelevant

the difference is, isRelevant needs to keep still-active tombstones, but those don't count for what the client sees

bq. IColumn c = cf.getColumnsMap().get(cname);

ah, right, I meant to keep that change but forgot when I was moving code around.  v3 attached.

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 1267-v2.txt, 1267-v3.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890717#action_12890717 ] 

Hudson commented on CASSANDRA-1267:
-----------------------------------

Integrated in Cassandra #496 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/496/])
    

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 1267-v2.txt, 1267-v3.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

T Jake Luciani updated CASSANDRA-1267:
--------------------------------------

    Comment: was deleted

(was: FYI, This didn't make it into 0.7-beta1 somehow)

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7 beta 1
>
>         Attachments: 1267-v2.txt, 1267-v3.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

T Jake Luciani updated CASSANDRA-1267:
--------------------------------------

    Attachment: cached-row-slice-perf-patch-1.txt

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889879#action_12889879 ] 

Hudson commented on CASSANDRA-1267:
-----------------------------------

Integrated in Cassandra #494 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/494/])
    performance improvements to removeDeleted on read path.  patch by jbellis and tjake for CASSANDRA-1267


> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 1267-v2.txt, 1267-v3.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889450#action_12889450 ] 

T Jake Luciani commented on CASSANDRA-1267:
-------------------------------------------

Looks great +1

> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 1267-v2.txt, 1267-v3.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1267) Improve performance of cached row slices

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889435#action_12889435 ] 

T Jake Luciani commented on CASSANDRA-1267:
-------------------------------------------


1) I see.  I thought  the logic was redundant QueryFilter.isRelevant call so I moved the increment the counter after this call.
2) awesome. this will remove the rest of the latency I was seeing.
3 & 4) sure.


I see you also removed the optimizations made to ColumnFamilyStore removeDeletedStandard() and removeDeletedSuper() 
The supplied approach is definatly faster because it avoids the following call

IColumn c = cf.getColumnsMap().get(cname);

For rows with millions of columns this is very slow compared to getEntrySet() pair approach.



> Improve performance of cached row slices
> ----------------------------------------
>
>                 Key: CASSANDRA-1267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1267
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 1267-v2.txt, cached-row-slice-perf-patch-1.txt
>
>
> In Lucandra, I have a use case to pull all columns for a given row.  
> I've noticed that for rows with large numbers of columns this takes much longer than I would think since row caching is enabled.
> After looking into this I see that the cached row is rebuilt and pruned even though I want all columns.  
> This patch skips this use case and in my case has improved performance significantly.
> From ~400ms to ~50ms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.