You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org> on 2011/10/26 17:57:37 UTC

[jira] [Commented] (CASSANDRA-2506) Push read repair setting down to the DC-level

    [ https://issues.apache.org/jira/browse/CASSANDRA-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136057#comment-13136057 ] 

Sylvain Lebresne commented on CASSANDRA-2506:
---------------------------------------------

On the principle itself, I'm a bit sad that we would add a second configuration options just for read_repair, especially given that read_repair is becoming much less useful in 1.0. I'm also afraid that the configuration options as implemented by this patch will be a bit confusing for people. The fact that read_repair_chance concern repair on the whole cluster while read_repair_chance_options concerns repair local to the cluster in particular and the fact that read_repair_chance has "priority" over the dc options.

That being said, I agree that distinguishing between read repair that cross dc boundaries and those that don't have merits, and that the proposed solution does allow for reasonable scenarios. And I have don't really see a much better solution right off the bat.

On the patches themselves:
* In ReadCallback, in the LOCAL case, we're including all the 'local' nodes, but that's not necessarilly enough to get blockFor endpoints. As is, it breaks CL.QUORUM and CL.ALL on multi-dc setup.
* The patch breaks DataCenterReadCallback. In the case where we don't read repair, we want to include blockFor endpoints *from the local DC*. That last part is crucial, otherwise we're breaking CL.LOCAL_QUORUM. The patch replace this by a sublist of the initial endpoints, but even though the endpoints are sorted by the snitch, there is no guarantee that the blockFor first one will be all of the local DC. It should be the case if the snitch is not ill configured, but there is no guarantee. That's why DCReadCallback if overriding preferedEndpoints in the first place.
* I would maybe rename read_repair_chance_options to something like dclocal_read_repair_chance or something like that.
* I think we can probably get rid of READ_REPAIR_TYPE and avoid too much duplication with DCReadCallback with a small refactor. For instance, we would have something like
  {noformat}
  private List<InetAddress> filterEndpoints(List<InetAddress> endpoints)
  {
      if (resolver instanceof RowRepairResolver)
          return endpoints;

      if (!(resolver instanceof RowDigestResolver))
          return preferedEndpoints(endpoints);

      double chance = random.get().nextDouble();
      if (cfmd.getReadRepairChance() > chance)
          return endpoints;
      Double dcConfigChance = cfmd.getReadRepairChanceOptions().get(localdc);
      if (dcConfigChance != null && dcConfigChance > chance)
          return localEndpoints(endpoints, localdc);
      else
          return preferedEndpoints(endpoints);
  }
  {noformat}
  Then localEndpoints would basically picks all the endpoints of the local dc plus enough of other dc to match blockFor and preferedEndpoints shouldn't have to change at all from what it is now.  That filterEndpoints would directly be used in the ReadCallback constructor.
* For compatibility sake with thrift, I don't think we should renumber the fields, the new option should be at the end.
* The patch seems to randomly change some field visibility for no reason I can see (the command and received field for RC are made public as well as the static random).
* In CFMetadata, a default is defined for the new options, but we still have the line
  {noformat}
  private Map<String, Double> readRepairChanceOptions = new HashMap<String, Double>(); // defaults to null
  {noformat}
  which will allocate a map that will be overwrited by the defaul right away (and the default is empty, not null)
* For CQL, the keyword is added but is not used. It would also be nice to update the docs for the cli and CQL while we're at it.

                
> Push read repair setting down to the DC-level
> ---------------------------------------------
>
>                 Key: CASSANDRA-2506
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2506
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.1
>            Reporter: Brandon Williams
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.2
>
>         Attachments: 0001-changes-to-interfaces.patch, 0002-changes-for-dc-readrepair.patch, 0003-changes-for-cql.patch
>
>
> Currently, read repair is a global setting.  However, when you have two DCs and use one for analytics, it would be nice to turn it off only for that DC so the live DC serving the application can still benefit from it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira