You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Scott Fines (JIRA)" <ji...@apache.org> on 2012/10/31 18:13:17 UTC

[jira] [Created] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Scott Fines created CASSANDRA-4886:
--------------------------------------

             Summary: Remote ColumnFamilyInputFormat
                 Key: CASSANDRA-4886
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
             Project: Cassandra
          Issue Type: Improvement
          Components: Hadoop
    Affects Versions: 1.1.6
            Reporter: Scott Fines
             Fix For: 1.1.6


As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 

It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.

This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 

It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Scott Fines (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Fines updated CASSANDRA-4886:
-----------------------------------

    Attachment: CASSANDRA-4886.path
    
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.path
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Scott Fines (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Fines updated CASSANDRA-4886:
-----------------------------------

    Comment: was deleted

(was: Here's a first attempt at allowing a configurable Client construction. I don't have any automated tests, unfortunately, but I *have* tested this against our development environment (6 nodes), and have it currently deployed in our production cluster (28 nodes), where it's been running for about a week with no problems. 

)
    
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Scott Fines (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Fines updated CASSANDRA-4886:
-----------------------------------

    Attachment:     (was: CASSANDRA-4886.path)
    
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488126#comment-13488126 ] 

Jonathan Ellis commented on CASSANDRA-4886:
-------------------------------------------

(Specifically, starting with https://issues.apache.org/jira/browse/CASSANDRA-2388?focusedCommentId=13104179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13104179 )
                
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Scott Fines (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488132#comment-13488132 ] 

Scott Fines commented on CASSANDRA-4886:
----------------------------------------

This is not a new InputFormat, this is a modification to ColumnFamilyRecordReader. And yes, this patch also suffers from not having good cross-DC support in a similar way. 

It seemed to me when reading CASSANDRA-2388, that there is some difficulty in making a single implementation work well in both situations. Rather than attempt to do that, then, this patch simply allows one to choose whether or not to act as if it's node-local or not.
                
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488107#comment-13488107 ] 

Jonathan Ellis commented on CASSANDRA-4886:
-------------------------------------------

I'm skeptical that we need a separate InputFormat to handle this.  (See discussion on CASSANDRA-2388.)
                
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Scott Fines (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502027#comment-13502027 ] 

Scott Fines commented on CASSANDRA-4886:
----------------------------------------

This patch creates a client abstraction for accessing Cassandra during MR activities. There are two versions of the Client--LocalClient and RemoteClient. 

The LocalClient behaves exactly as the current CFRR does--it sends all requests to a single node and fails if any of those requests fail. This is ideal for situations when the TT is running on the same node as Cassandra. 

The RemoteClient, on the other hand, allows thrift timeouts and other errors to occur without immediately failing the task. Instead, when certain exceptions are caught, it will try the same request on the next replica of that data, and only fail after it has tried all replicas for that data. This is ideal when circumstances force you to run TTs separately from Cassandra nodes.

The default Client is the LocalClient, to maintain backwards compatibility. To use the RemoteClient, one would have to call

{code}
ConfigHelper.useRemoteClient(conf,true);
{code}

This is only a little different in effect from what Jake is proposing in CASSANDRA-2388. The main difference here is that this patch forces the user to explicitly require a remote mode, and it requires no changes in the current server interface. The proposed pattern in CASSANDRA-2388 also makes no explicit mention of failure modes for off-node hadoop integration, which this patch is designed to manage. 

Both approaches suffer the same problems with multiple datacenters--the RemoteClient is even worse with multiple DCs, since it will try all replicas. The adjustment Jake proposes sounds like it will only affect the logic in constructing the replica list, which would be compatible with this patch's approach to fault-tolerance.

                
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Scott Fines (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Fines updated CASSANDRA-4886:
-----------------------------------

    Attachment: CASSANDRA-4886.patch

Here's a better patch version
                
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4886) Remote ColumnFamilyInputFormat

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501879#comment-13501879 ] 

Jonathan Ellis commented on CASSANDRA-4886:
-------------------------------------------

Very confused.  What problem are you solving that 2388 would not solve, assuming someone implemented Jake's proposal?  What difference in behavior are you proposing for local vs non-local, and who chooses which behavior CFRR uses?
                
> Remote ColumnFamilyInputFormat
> ------------------------------
>
>                 Key: CASSANDRA-4886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.1.6
>            Reporter: Scott Fines
>             Fix For: 1.1.6
>
>         Attachments: CASSANDRA-4886.patch
>
>
> As written, the ColumnFamilyInputFormat does not have a great deal of fault tolerance. 
> It only attempts to perform a read from a single replica, with an infinite timeout. If that replica is not available, then the Task fails, and must be retried on a different node.
> This is fine if the TaskTrackers are colocated with Cassandra nodes, but is very fragile when this is not possible. When the Tasktrackers are remote to cassandra, the same rules about clients should apply--there should be a strict (configurable) timeout, and the ability to retry requests on a different replica if at single request fails. 
> It seems obvious that we'd want to support both types of architecture; to do that, we should probably have a configuration which allows the user to specify his architecture choices explicitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira