You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2009/05/22 19:04:46 UTC

[jira] Created: (CASSANDRA-197) Expose ring map to client for more direct access

Expose ring map to client for more direct access
------------------------------------------------

                 Key: CASSANDRA-197
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Jun Rao
            Assignee: Jun Rao


For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712847#action_12712847 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

I don't think there's any question that if we had a patch adding the gossip feature I am positing, vs this one, that the gossip one is the Right Approach to take.  The only problem is that that is tricky code. :)

If you think about it that way, then I think it's clear that this patch is fine to use locally as a kludge to achieve what you want but it's not something we want to support as part of 0.4 and beyond.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-197:
------------------------------

    Attachment: flexjson.jar
                patch197.v2

v2 patch:
1. Extended get_string_property to support querying "token map". The returned string is a JSON encoded <token, host> map.
2. A new class RingCache that caches the token map in a java application. RingCache uses storage-conf.xml to obtain things like seeds, various ports, the partitioning strategy, and the replication factor. During initialization, RingCache makes a get_string_property call to the seed node to get the initial token-to-host map. It then reuses the logic in the server to map a row key to a list of hosts owning the row. If the token map changes, it's the application's responsibility to refresh RingCache by calling refreshEndPointMap(). RingCache doesn't start a Cassandra instance and can be embedded in any java application. 
3. An example of how to use RingCache is provided at test/unit/org.apache.cassandra.client.TestRingCache. For Hadoop integration, one can use RingCache in InputFormat to generate splits with locality.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao closed CASSANDRA-197.
-----------------------------


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Eric Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725232#action_12725232 ] 

Eric Evans commented on CASSANDRA-197:
--------------------------------------

Voldemort does not use gossip, no. In fact, when last I looked at it, there was no failure detection at all. Each node is configured with the same list as the others, the thick client uses one or more seeds to fetch that list from a special store using the same get() semantics as any other object.

A cassandra thick client could learn the ring via gossip and utilize the same failure detection, but obviously there'd need to be some sort of distinction since the client isn't part of the ring. I suppose you could also periodically poll this information from one or more of the nodes, but that sounds ugly.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712836#action_12712836 ] 

Sandeep Tata commented on CASSANDRA-197:
----------------------------------------

I'd be +1 on supporting an endpoint-for-key call in the API. 

That might be a good compromise between not including a lot of server code in the client, and still allowing the client the flexibility to behave in a more efficient way should it choose to.



> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712171#action_12712171 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

-1 on violating encapsulation wholesale.

i could see maybe having a call "endpoint_for_key" that gives an ip/port pair for primary node owning a key, but would that be useful enough to bother with?

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747116#action_12747116 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

> I'm really hesitant about pushing JSON through a string.

Why?  That's what it's designed for.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744208#action_12744208 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

How about exposing the token-to-node map through the get_string_property() API?

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712176#action_12712176 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

If you look at the original Dynamo paper, it definitely talks about exposing key locations for shorter network path.

You can think of RingCache part of the cassandra client library. It's just a cache and only serves as hints. Note that bigtable/Hbase expose the locality to clients too. Ideally, such exposure should be hidden in the client library. However, this is a bit hard to do in Cassandra since the client code is generated by Thrift. In any case, I don't see any harm for exposing such information.

As for the endpoint_for_key call, this will force the client to cache the location for every row, which is likely more expensive.



> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746541#action_12746541 ] 

Michael Greene commented on CASSANDRA-197:
------------------------------------------

The table description get_string_property returns a multi-level dictionary encoded as JSON would.  I'm not sure I agree with the 'non-official APIs need to be strings' de facto policy but if it stands I'd imagine we would also encode the return for this call like JSON would.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747108#action_12747108 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

    private static Logger LOG = Logger.getLogger(RingCache.class);

can we make this final and rename to "logger" for consistency w/ the rest of the code base?

        EndPoint[] endPoints = nodePicker_.getStorageEndPoints(partitioner_.getToken(key));
        return endPoints;

just "return nodePicker_.getStorageEndPoints(partitioner_.getToken(key))" is fine now that retrofitports is unnecessary.

these are minor, so +1 on committing w/o a new patch.


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744266#action_12744266 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

+1

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746940#action_12746940 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

+1 on the approach in general.  some nits:

        partitioner_ = DatabaseDescriptor.getPartitioner();

inline this in declaration

        Map<Token,EndPoint>endpointMap

spacing

        for (String seed : seeds_)

shouldn't this break after getting an answer?  shouldn't need to contact all seeds each time

    private void retrofitPorts(EndPoint endPoints[])

this should be encapsulated in tokenmetadata (yeah, I know it's that way in existing code) -- have a cloneControl and cloneStorage method instead of forcing consumers of the api to know that they need to do this "magic" fixup later.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-197:
------------------------------

    Attachment: patch197.v3

patch v3. Fixed all the above comments.

Also, the retrofitport code in AbstractStrategy doesn't seem to be necessary since all EndPoints registered to tokenMetaData are already initialized to storagePort. We probably need to open another ticket to fix that.


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712624#action_12712624 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

It makes no sense to incur the disadvantages of making your client an add-on to the server code without going all the way.  The server already has perfectly good connection pooling code; it makes sense to use that instead of creating a separate pooling mechanism.  Nor does it make sense when you are building off the server jars to go through the extra thrift serialization; make the calls directly via MessagingService.

The right way to do this is to make the tweaks to the gossip layer necessary to let a specialized server-client get token ring information with server-level APIs.  I'm strongly against the kludge of doing this at the client layer.  Since the only consumers of that API will necessarily have access to the server codebase, keeping it in server APIs makes the most sense, and we already have an server API for distributing token information.


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Eric Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713381#action_12713381 ] 

Eric Evans commented on CASSANDRA-197:
--------------------------------------

"So to me when I say a "client level api" I strictly mean one we expose via thrift to processes that are not expected to know anything about the Cassandra internals. As soon as you say "we should expose this to the client, but it will need to use (non-thrift) org.apache.cassandra classes and the server .xml file" then we are in violation of this implicit contract."

I agree with this. In fact, I was concerned we doing exactly that (violating this implicit contract) when we added the new block_for arguments which assume client-side knowledge of the replication policy.

Personally, I'd hate to see us go any further in this direction.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744543#action_12744543 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

I was thinking of putting the RingCache code in contrib. RingCache will reuse the partitioner etc code in the server, to determine which nodes a given key is mapped into. This will be a java-only solution for now.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713034#action_12713034 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

It will be a JVM-only option since that is what the server runs on, but that's less of an issue than you might think given how many languages have excellent JVM implementations (JRuby, Jython, ...)

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747170#action_12747170 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

>From the tokenToEndpoint map, it's not too hard to create the RangeToNodes map. You can reuse the logic in the server by making methods like StorageService.getAllRanges() static.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712249#action_12712249 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

Okay, so you're kind of taking a hybrid approach where the "client" is including large parts of the cassandra server code, and needs the server config file, but doesn't actually get the benefits of being part of the gossip layer.

I think that's fine as a halfway measure that you can use privately but the "right thing" is to go the other 10% of the way towards being a member of the cluster instead of hacking some of the data out via the client API.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747171#action_12747171 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

> It doesn't [fit Jeff's needs] for reasons already discusse

I don't follow.  Over in CASSANDRA-342 I see

[Stu] Good call... the RingCache mentioned on CASSANDRA-197 is exactly what this ticket needs.

[Jeff] Yep, that is exactly what I was thinking about. I'll nose myself around over there.

I don't see where you explain how this doesn't work.  I'm sure I'm just missing something.

> I find it very odd that we think we're going to "hide" functionality through an obscure endpoint. And we actually want this to a) be simple to implement a client side to and b) "look and act" like the rest of the client code

No

> because we're going to use it for the hadoop code.

Yes.  One rule of API design is to make it easy to do the right thing and hard to do the wrong one.  99% of clients should not use this API and we do not want to encourage them.  Only Hadoop and some very specialized apps will want or need it.  Read Eric's comments above; this is not code that any Thrift client can just use with or without a "real" Thrift api; it requires embedding nontrivial parts of the Cassandra server (hence, the jvm as well, which immediately rules out many Thrift consumers).

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747111#action_12747111 ] 

Jeff Hodges commented on CASSANDRA-197:
---------------------------------------

I'm really hesitant about pushing JSON through a string. We're going to be using this extensively in the hadoop code, why not just make it a real endpoint and lose the JSON library? I can whip up a patch tonight. 

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746517#action_12746517 ] 

Jeff Hodges commented on CASSANDRA-197:
---------------------------------------

I'm assuming then that the type of the return value of get_string_property (or get_string_list_property) would change to accommodate a map instead of a simple string? Or am I missing something?

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Greene updated CASSANDRA-197:
-------------------------------------

    Component/s: Core

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747149#action_12747149 ] 

Jeff Hodges commented on CASSANDRA-197:
---------------------------------------

I find it very odd that we think we're going to "hide" functionality through an obscure endpoint. And we actually want this to a) be simple to implement a client side to and b) "look and act" like the rest of the client code because we're going to use it for the hadoop code.

The combination of this rather odd idea that we can "hide" functionality by making it harder on someone to use it and having to actually need it and want it working well for our own uses seem to be counter to one another. Why not just make it easier on ourselves, and make it clean and simple?

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712293#action_12712293 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

I am not sure if we should make client a member of the cluster and receive all the gossiped messages. For request routing, the client only cares about the event of adding/removing nodes, which is infrequent. Making client a member of the cluster exposes it to way too many irrelevant messages. An out-of-date cached ring map only affects performance, not correctness. In practice, it's probably good enough for each client to simply refresh its cached ring map, say once per hour.

As for that the client has to include the server code and config file, I don't think it's a big deal. The server jar is already needed for java clients today and a client has to be aware of the thrift port. Further, each client has to explicitly maintain a list of cassandra servers to connect to (RingCache obviates that).

It is a bit awkward to expose the getRIngMap api in thrift, since only the client library really needs to see it, not the actual client code. However, thrift seems to be the only simple way for a client to talk to the server. MessageService is too heavy to use just for infrequent server calls. Could we just document that the api is not intended for direct client usage?




> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724914#action_12724914 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

Eric, how does Voldemort handle this?

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712851#action_12712851 ] 

Sandeep Tata commented on CASSANDRA-197:
----------------------------------------

The client participating in the gossip protocol without actually grabbing tokens is an interesting idea, but I'm not sure how it will work for non Java clients. What happens to python and ruby clients ? Will this be a Java-only option?


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713091#action_12713091 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

I feel like that we are mixing the discussion of things at different levels. As a result, it is hard to see where the disagreement is. So, Let me step back and put an outline for further discussion.

1. Should we expose any locality to the client?

2. If we agree that some sort of locality is needed, what is the right api (independent of the implementation) to support such locality? I see two options here: expose per key locality or expose the whole ring map.

3. Finally what is the right way to implement locality? One option is to use thrift. With this option, the client has to pull periodically to refresh locality information. Another option is to extend MessageService. With this option, the server can potentially push new locality information to the client (when changes occur).

Let's discuss each of these a bit further and see if we can reach an agreement.

Here is my opinion on those.
For 1, I strongly support exposing locality to clients for better performance. Some of my preliminary tests showed up to 50% improvement for simple get_column calls when locality is enabled.

For 2, I favor exposing the whole ring map. The main reason is that this makes invalidating locations cached at the client easier and potentially cheaper.

For 3, I can see the benefit of using extended MessageService. I am not exactly sure what it takes to implement such an extension though.



> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712216#action_12712216 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

What you said in the paragraph is not completely true. The purpose of RingCache is exactly to hide the detailed configurations from the client. RingCache picks up the setting from the server configuration file so it will adjust to things like partitioner changes (which likely require both server and client restart) accordingly. The client code doesn't have to change at all. Take a look at test/org.apache.cassandra.service.TestRingCache.java and see what you think.

Also, the RingCache layer is much thinner compared with the thrift client.


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713143#action_12713143 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

The problem is "client" in a loose sense means "anything that can add data to or request data from the system."  So each cassandra node can be considered a "client" of the others which is meaningless.

So to me when I say a "client level api" I strictly mean one we expose via thrift to processes that are not expected to know anything about the Cassandra internals.  As soon as you say "we should expose this to the client, but it will need to use (non-thrift) org.apache.cassandra classes and the server .xml file" then we are in violation of this implicit contract.

So:

No, we should not expose any internals to thrift-based clients.  Other "client" processes based on the server internals can of course make use of those internals as much as they want, and the best way to do this is to have them use the existing apis as much as possible rather than adding hacks to shuttle some of this state from one server process to another via thrift.  Using the exising APIs promotes clean design and avoids other "wheel re-invention" problems as I described above.  ("The server already has perfectly good connection pooling code; it makes sense to use that instead of creating a separate pooling mechanism. Nor does it make sense when you are building off the server jars to go through the extra thrift serialization; make the calls directly via MessagingService [i.e., storageproxy].")

So having thought things through it is clear to me that my earlier suggestion ("i could see maybe having a call "endpoint_for_key" that gives an ip/port pair for primary node owning a key") is also a poor one, and I withdraw it.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747433#action_12747433 ] 

Hudson commented on CASSANDRA-197:
----------------------------------

Integrated in Cassandra #177 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/177/])
    Expose ring map to client for more direct access; patch by junrao; reviewed by jbellis for 


> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Eric Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744528#action_12744528 ] 

Eric Evans commented on CASSANDRA-197:
--------------------------------------

How is the client going to know the Partitioner, ReplicaPlacementStrategy, and ReplicationFactor? Are non-java clients expected to re-implement the Partitioner and ReplicaPlacementStrategy? 

Unless I'm missing something, a token-to-node map doesn't seem very useful in the general sense without these.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747164#action_12747164 ] 

Jeff Hodges commented on CASSANDRA-197:
---------------------------------------

It doesn't for reasons already discussed in CASSANDRA-342. See https://issues.apache.org/jira/browse/CASSANDRA-342?focusedCommentId=12745809&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12745809

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747117#action_12747117 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

Well, to return an endpoint map directly, we need to add a new thrift api. Also, to use RingCache with Hadoop, you already need to make the cassandra jar and all its dependent jars available to Hadoop, so adding another json library doesn't seem like a bit deal.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Greene reopened CASSANDRA-197:
--------------------------------------


It looks like we'll be going with the more limited string property version.  Re-opening.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712179#action_12712179 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

yes, dynamo and bigtable both do that, but we're implementing a much thinner client than they did.  you sometimes have higher latency but it's much easier to write a client.

I don't think this is something we want to expose.  The mapping between tokens and keys is pluggable and should be treated as a black box by the client.  Violating that encapsulation gives the client a really really big gun with which to shoot himself in the foot for a rather small benefit.  (Oh, you switched from RP to OPP?  Now you have to rewrite all your calls because you were manually mapping keys to nodes, sorry.  Let alone writing your own Partioner.)

If you wanted to go for a more dynamo-like experience you could add a "gossip only" mode where a client "node" asks to be part of the gossip network but with no token of its own.  That way it would be as current as the rest of the cluster about ring status, and it would be very clear that if you were writing this kind of client you need to know cassandra internals and There Be Dragons Here.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-197:
------------------------------

    Attachment: issue197.patchv1

Attach a patch.
--a new thrift api for obtaining the ring map.

--a RingCache class for the client to easily cache the ring map.

--an example test/org.apache.cassandra.service.TestRingCache.java that uses RingCache

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725221#action_12725221 ] 

Jonathan Ellis commented on CASSANDRA-197:
------------------------------------------

I tried to google this but was stymied by the Rowling fanboys' seo noise:

Does Voldemort make the fat clients part of the gossip network, or do the clients periodically refresh from the server nodes?  Does Voldemort even use gossip?

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747153#action_12747153 ] 

Jun Rao commented on CASSANDRA-197:
-----------------------------------

The intention of the new class RingCache is to make it easy to exploit the exposed endpoint map. You just need to instantiate it and then call getEndPoint() to get the endpoints for a given key. Take a look at it and see if it fits your need.

> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: flexjson.jar, issue197.patchv1, patch197.v2, patch197.v3
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-197) Expose ring map to client for more direct access

Posted by "Eric Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725212#action_12725212 ] 

Eric Evans commented on CASSANDRA-197:
--------------------------------------

Voldemort does have an rpc'ish remote API like we do, and then they have a "thick client". 

The thick client uses the same bootstrapping mechanism that nodes do so it knows each of them, their location in the ring, which partitions they are responsible for, etc. It knows of all the configured stores, the replication policy, and the serialization used for keys and values. 

Reads and writes are made directly against the nodes, so for example, if you do a read where R=3, the client API determines where all of the copies are and attempts to fetch 3 of them concurrently.  If the corresponding store is configured for string serialization of the key and java serialization of values, the client API transparently de-serializes them accordingly.

Obviously the thick client and the nodes share a great deal of common code, which means it is Java-only. 




> Expose ring map to client for more direct access
> ------------------------------------------------
>
>                 Key: CASSANDRA-197
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-197
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue197.patchv1
>
>
> For certain applications, it would be nice if a read is sent to a node that owns the data locally. This saves an extra network hop. To do that, a client will need to cache the ring map and use it to figure out the nodes owning a row.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.