You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benjamin Coverston (JIRA)" <ji...@apache.org> on 2011/04/14 17:31:05 UTC

[jira] [Created] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Secondary Indexes Build Very Slowly
-----------------------------------

                 Key: CASSANDRA-2470
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Benjamin Coverston


While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.

Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.

I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021186#comment-13021186 ] 

Jonathan Ellis commented on CASSANDRA-2470:
-------------------------------------------

I'm thinking that our best bet here is CASSANDRA-2324 (if repair doesn't transfer a lot more data than necessary, index building will also be faster).

> Secondary Indexes Build Very Slowly
> -----------------------------------
>
>                 Key: CASSANDRA-2470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Coverston
>              Labels: repair, secondary_index
>
> While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.
> Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.
> I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2470.
---------------------------------------

    Resolution: Duplicate

closing in favor of the above specific improvements
                
> Secondary Indexes Build Very Slowly
> -----------------------------------
>
>                 Key: CASSANDRA-2470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Coverston
>              Labels: repair, secondary_index
>
> While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.
> Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.
> I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127061#comment-13127061 ] 

Jonathan Ellis commented on CASSANDRA-2470:
-------------------------------------------

... Now that 2324 and 2498 are both done, how does index build look?
                
> Secondary Indexes Build Very Slowly
> -----------------------------------
>
>                 Key: CASSANDRA-2470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Coverston
>              Labels: repair, secondary_index
>
> While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.
> Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.
> I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Posted by "Jason Haruska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020109#comment-13020109 ] 

Jason Haruska commented on CASSANDRA-2470:
------------------------------------------

This bug also affects adding new nodes to a cluster. In our case, the data streaming finishes in 20-30 minutes with the secondary indexes taking 3-4 hours to build.

> Secondary Indexes Build Very Slowly
> -----------------------------------
>
>                 Key: CASSANDRA-2470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Coverston
>              Labels: repair, secondary_index
>
> While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.
> Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.
> I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021216#comment-13021216 ] 

Jonathan Ellis commented on CASSANDRA-2470:
-------------------------------------------

CASSANDRA-2498 may also help, but I caution that it is tagged ponies. :)

> Secondary Indexes Build Very Slowly
> -----------------------------------
>
>                 Key: CASSANDRA-2470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Coverston
>              Labels: repair, secondary_index
>
> While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.
> Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.
> I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2470) Secondary Indexes Build Very Slowly

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127086#comment-13127086 ] 

Jonathan Ellis commented on CASSANDRA-2470:
-------------------------------------------

CASSANDRA-2897 is another possible optimization.
                
> Secondary Indexes Build Very Slowly
> -----------------------------------
>
>                 Key: CASSANDRA-2470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2470
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Coverston
>              Labels: repair, secondary_index
>
> While running repair I noticed that the time it took to run was easily dominated by building the secondary indexes. They currently build at a rate of < 200KB/Second. This means that indexing a 500MB file takes nearly an hour to index the file.
> Because this happens on the compaction thread it also causes repair to back up in general, which for very active systems is a very bad thing.
> I suggest we look at it in the hope that we can improve the rate in which we build the indexes by an order of magnitude.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira