You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by "maxim-chn (via GitHub)" <gi...@apache.org> on 2023/06/13 19:51:27 UTC

[GitHub] [cassandra] maxim-chn opened a new pull request, #2415: Replica nodes for batch operation are picked by response time

maxim-chn opened a new pull request, #2415:
URL: https://github.com/apache/cassandra/pull/2415

   [CASSANDRA-18120](https://issues.apache.org/jira/browse/CASSANDRA-18120). Random replica node pick for batch operations was changed to a pick by the nodes' response time. 
   
   Motivation:
   
   - Avoid any potential followup change by the end-users.
   - Minimize scope of the change.
   - Do not introduce expensive logic.
   
   The idea is to use readily-available interpretation of the potential replica node's communication time as a proxy indicator.  
   It indicates if the node will take longer to complete batch operation than other nodes.
   
   Value [PHI](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.7427&rep=rep1&type=pdf) is used as an indicator. It is calculated during gossip. The longer it takes node to complete gossip operation, the higher PHI value gets.  
   The assumption here is that longer gossip completion surely indicates a longer batch operation completion. Additional PHI description is available at [CASSANDRA-2597](https://issues.apache.org/jira/browse/CASSANDRA-2597)'s comment section.
   
   patch by Maxim Chanturiay; reviewed by <Reviewers> for CASSANDRA-18120
   
   Co-authored-by: Dan Sarisky <email1>
   
   The [Cassandra Jira](https://issues.apache.org/jira/projects/CASSANDRA/issues/)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] smiklosovic commented on a diff in pull request #2415: Replica nodes for batch operation are picked by response time

Posted by "smiklosovic (via GitHub)" <gi...@apache.org>.
smiklosovic commented on code in PR #2415:
URL: https://github.com/apache/cassandra/pull/2415#discussion_r1247045856


##########
src/java/org/apache/cassandra/gms/IFailureDetector.java:
##########
@@ -37,6 +40,28 @@ public interface IFailureDetector
      */
     public boolean isAlive(InetAddressAndPort ep);
 
+    /**
+     * Sorts the endpoints based on their response time.
+     * Nodes at smaller indexes have faster response time than nodes at higher indexes.
+     * @param List<InetAddressAndPort> endpoints.
+     * @return void.
+     */
+    public default void sortEndpointsByResponseTime(List<InetAddressAndPort> endpoints)

Review Comment:
   @maxim-chn does this mean that if I call this method, it will throw an exception because `isEpAResponseTimeSlowerThanEpB` is not implemented? So by default it will throw exception?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] smiklosovic commented on a diff in pull request #2415: Replica nodes for batch operation are picked by response time

Posted by "smiklosovic (via GitHub)" <gi...@apache.org>.
smiklosovic commented on code in PR #2415:
URL: https://github.com/apache/cassandra/pull/2415#discussion_r1247045856


##########
src/java/org/apache/cassandra/gms/IFailureDetector.java:
##########
@@ -37,6 +40,28 @@ public interface IFailureDetector
      */
     public boolean isAlive(InetAddressAndPort ep);
 
+    /**
+     * Sorts the endpoints based on their response time.
+     * Nodes at smaller indexes have faster response time than nodes at higher indexes.
+     * @param List<InetAddressAndPort> endpoints.
+     * @return void.
+     */
+    public default void sortEndpointsByResponseTime(List<InetAddressAndPort> endpoints)

Review Comment:
   @maxim-chn do this mean that if I call this method, it will throw an exception because `isEpAResponseTimeSlowerThanEpB` is not implemented? So by default it will throw exception?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] maxim-chn commented on a diff in pull request #2415: Replica nodes for batch operation are picked by response time

Posted by "maxim-chn (via GitHub)" <gi...@apache.org>.
maxim-chn commented on code in PR #2415:
URL: https://github.com/apache/cassandra/pull/2415#discussion_r1249247460


##########
src/java/org/apache/cassandra/gms/IFailureDetector.java:
##########
@@ -37,6 +40,28 @@ public interface IFailureDetector
      */
     public boolean isAlive(InetAddressAndPort ep);
 
+    /**
+     * Sorts the endpoints based on their response time.
+     * Nodes at smaller indexes have faster response time than nodes at higher indexes.
+     * @param List<InetAddressAndPort> endpoints.
+     * @return void.
+     */
+    public default void sortEndpointsByResponseTime(List<InetAddressAndPort> endpoints)

Review Comment:
   @smiklosovic , yes indeed.
   The motivation is to encourage any future implementation of `IFailureDetector` (besides `FailureDetector`) to take into consideration how 2 endpoints are compared in terms of response time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


Re: [PR] Replica nodes for batch operation are picked by response time [cassandra]

Posted by "shunsaker (via GitHub)" <gi...@apache.org>.
shunsaker commented on code in PR #2415:
URL: https://github.com/apache/cassandra/pull/2415#discussion_r1484288959


##########
src/java/org/apache/cassandra/locator/ReplicaPlans.java:
##########
@@ -302,16 +302,30 @@ public static Collection<InetAddressAndPort> filterBatchlogEndpoints(String loca
         }
         else
         {
-            racks = Lists.newArrayList(validated.keySet());
-            shuffle.accept((List<?>) racks);
+            ArrayList<String> sortedRacks = Lists.newArrayList(validated.keySet());
+            sortedRacks.sort((String rackA, String rackB) -> {
+                List<InetAddressAndPort> rackMembersA = validated.get(rackA);
+                List<InetAddressAndPort> rackMembersB = validated.get(rackB);
+
+                if (rackMembersA == null || rackMembersA.isEmpty())
+                    return 1;
+                if (rackMembersB == null || rackMembersB.isEmpty())
+                    return -1;
+                
+                endpointsSorter.accept(rackMembersA);
+                endpointsSorter.accept(rackMembersB);
+
+                return enpointsCompare.test(rackMembersA.get(0), rackMembersB.get(0)) ? 1 : -1;
+            });
+            racks = sortedRacks;
         }
 
         // grab a random member of up to two racks
         List<InetAddressAndPort> result = new ArrayList<>(2);
         for (String rack : Iterables.limit(racks, 2))
         {
             List<InetAddressAndPort> rackMembers = validated.get(rack);
-            result.add(rackMembers.get(indexPicker.apply(rackMembers.size())));
+            result.add(rackMembers.get(0)); // Pick the node with fastest response time in the rack

Review Comment:
   If there are exactly 2 racks, then I don't think the nodes have been sorted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org