You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/07/07 21:38:50 UTC

[jira] Created: (CASSANDRA-1258) rebuild indexes after streaming

rebuild indexes after streaming
-------------------------------

                 Key: CASSANDRA-1258
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
             Project: Cassandra
          Issue Type: Sub-task
          Components: Core
            Reporter: Jonathan Ellis
             Fix For: 0.7


since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.

seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894377#action_12894377 ] 

Hudson commented on CASSANDRA-1258:
-----------------------------------

Integrated in Cassandra #506 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/506/])
    rebuild secondary indexes after streaming.  patch by Nate McCall and jbellis for CASSANDRA-1258


> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-sstr-test.txt, 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, 1258-v8.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892879#action_12892879 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

(I goofed and did not include SSTWT in v4, but it's unchanged from v3.)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: trunk-1258-src.txt

Patches for allowing CFS to accept a "recovered" SSTableReader from which to retrieve the indexed columns.

Having this on CFS allows for other uses such as added indexes after the fact, and providing mbean hooks into rebuilding indexes. 

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7
>
>         Attachments: trunk-1258-src.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment:     (was: trunk-1258-src-2.txt)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1258:
--------------------------------------

    Attachment: 1258-v4.txt

added code in v4 to flush the index CFSes before finalizing the index + filter.  getting test failure in SSTWT though -- not sure if the flush is exposing a problem in the test, or if it was already failing (didn't check).

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1258:
--------------------------------------

    Attachment: 1258-v8.txt

version that pushes CFMetadata into SSTable objects.  tests pass now.

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, 1258-v8.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890589#action_12890589 ] 

Nate McCall edited comment on CASSANDRA-1258 at 7/21/10 12:54 PM:
------------------------------------------------------------------

Patches for allowing CFS to accept a "recovered" SSTableReader from which to retrieve the indexed columns.

Having this on CFS allows for other uses such as added indexes after the fact, and providing mbean hooks into rebuilding indexes. 

Edit: this won't flush correctly unless the patch in CASSANDRA-1301 is applied as well.

      was (Author: zznate):
    Patches for allowing CFS to accept a "recovered" SSTableReader from which to retrieve the indexed columns.

Having this on CFS allows for other uses such as added indexes after the fact, and providing mbean hooks into rebuilding indexes. 
  
> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7
>
>         Attachments: trunk-1258-src.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891688#action_12891688 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

give the SSTR approach a try, see if it works out.

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893712#action_12893712 ] 

Nate McCall commented on CASSANDRA-1258:
----------------------------------------

Initially I was hesitant to mess with the IO api since most stuff there went through DD to get information. Having a member for the comparator (CFMD as well?) on SSTR would make the sstable plumbing underneath a lot cleaner. 

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: trunk-1258-src-2.txt

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893606#action_12893606 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

thanks

but now it occurs to me, since the root of the problem is SSTableReader, shouldn't we push the comparator in there, instead of using if statements to avoid calling SSTR.getComparator?

sorry for the run-around...

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893896#action_12893896 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

Let's table this for now like you say.

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891316#action_12891316 ] 

Nate McCall edited comment on CASSANDRA-1258 at 7/22/10 4:28 PM:
-----------------------------------------------------------------

New patch file rebases on most recent trunk. Edit: added additional logging statement on completion for more visibility. 

      was (Author: zznate):
    New patch file rebases on most recent trunk
  
> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: 1258-v7.txt

Limits scope of CFS passthrough

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893477#action_12893477 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

let's not pass CFS objects where none is needed (all the sstable-related machinery)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891683#action_12891683 ] 

Nate McCall commented on CASSANDRA-1258:
----------------------------------------

I had started adding this on SSTR initially, but there wasnt any other "query" stuff going on there, so it seemed out of place - not a very compelling argument, but this was my first time with the plumbing. I have no problem moving it to SSTR - so let me know.

I had forgotten I left the constructor in their (I had started messing around with creating indexes after the fact and meant to take it out). Let me know about SSTR and I'll rebase and get rid of the constructor.



 

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-1258:
-----------------------------------------

    Assignee: Nate McCall

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment:     (was: trunk-1258-src.txt)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: trunk-1258-src-2.txt

New patch file rebases on most recent trunk

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: trunk-1258-src-3.txt

skip over BF creation to save some overhead

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892910#action_12892910 ] 

Nate McCall commented on CASSANDRA-1258:
----------------------------------------

So the flush, by putting this on disk in an sstable, triggers the loading of IColumnIterators via line 952 on ColumnFamilyStore. Without a flush, no SSTRs are present.

The issue with this is that DatabaseDescriptor (via getComparator() called via the line above) does not know about the "private" indexed CFs. 

Given the above, I dont think this has ever worked outside of a test harness of some sort (ie. after an indexed CF is flushed and the callstack for CFS.scan is invoked). 

Should DatabaseDescriptor look into the metadata to see if this is an indexed column and return the comparator that way?

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: 1258-v6.txt

1258-v6.txt Replaces CFMD with CFS passthrough

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893768#action_12893768 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

going through DD is a wart, not a feature :)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: trunk-1258-src.txt

replacing patch file for minor code style change

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7
>
>         Attachments: trunk-1258-src.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: 1258-sstr-test.txt

adds coverage for recoverAndOpen

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-sstr-test.txt, 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, 1258-v8.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: 1258-v5.txt

Passes down comparator and CFMetaData (which wanted to be found in DD as well) into sliceiterators. 

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892913#action_12892913 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

no, we should get the comparator from CFS instead.  probably we should do that in getTopLevelColumns and pass it to getSSTableColumnIterator the way we do w/ getMemtableColumnIterator

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment: trunk-1258-src-3.txt

trunk-1258-src-3.txt does indexed column creation through SSTableWriter

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment:     (was: trunk-1258-src.txt)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891679#action_12891679 ] 

Jonathan Ellis commented on CASSANDRA-1258:
-------------------------------------------

Seems like the best place to put this code is in SSTR.recoverAndOpen.  no?

What is the point of the refactoring to CFS constructor?  If it's not necessary for the feature, let's keep refactoring and new-feature-code in separate patches.

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate McCall updated CASSANDRA-1258:
-----------------------------------

    Attachment:     (was: trunk-1258-src-3.txt)

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894247#action_12894247 ] 

Nate McCall commented on CASSANDRA-1258:
----------------------------------------

SSTableReader.makeColumnFamily was still going through DD in a way that kept Indexed column CFs invisible - this came out through SSTableSliceIterator. This diff on SSTR will fix it:

561c561
<         return ColumnFamily.create(metadata);
---
>         return ColumnFamily.create(getTableName(), getColumnFamilyName());

which would have been caught a lot easier with SSTableWriterTest :-)




> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, 1258-v8.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1258) rebuild indexes after streaming

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893870#action_12893870 ] 

Nate McCall commented on CASSANDRA-1258:
----------------------------------------

I would like to stick with the approach in v7 for the scope of this ticket. I just took a stab at the above (CFMD at SSTR creation time to provide the comparator and CFMD).

The changes are starting to reach into a lot of places: Memtable/BinaryMem., CompartionMgr (for SSTW.closeAndReopenReader), SST export, etc. 

I tried an initial set of changes with a fallback to DD when no CFMD was provided and got into a weird race condition that hung junit/ant. 

I'd like to put in a new ticket for 'SSTable initialization cleanup to avoid DD usage' for the above if your cool with that, primarily so I can stick a fork in this and knock out the thrift update. 

> rebuild indexes after streaming
> -------------------------------
>
>                 Key: CASSANDRA-1258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1258
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Nate McCall
>             Fix For: 0.7 beta 1
>
>         Attachments: 1258-v4.txt, 1258-v5.txt, 1258-v6.txt, 1258-v7.txt, trunk-1258-src-2.txt, trunk-1258-src-3.txt
>
>
> since index CFSes are "private," they won't be streamed with other sstables.  which is good, because the normal partitioner logic wouldn't stream the right parts anyway.
> seems like the right solution is to extend SSTW.maybeRecover to rebuild indexes as well.  (this has the added benefit of being able to use streaming as a relatively straightforward "bulk loader.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.