You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Philip Andronov (Created) (JIRA)" <ji...@apache.org> on 2012/02/03 12:41:53 UTC
[jira] [Created] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Unnecessary ReadRepair request during RangeScan
------------------------------------------------
Key: CASSANDRA-3843
URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.0.0
Reporter: Philip Andronov
Priority: Critical
During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
Code explanations:
class RangeSliceResponseResolver {
// ....
private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
{
// ....
protected Row getReduced()
{
ColumnFamily resolved = versions.size() > 1
? RowRepairResolver.resolveSuperset(versions)
: versions.get(0);
if (versions.size() < sources.size())
{
for (InetAddress source : sources)
{
if (!versionSources.contains(source))
{
// [PA] Here we are adding null ColumnFamily.
// later it will be compared with the "desired"
// version and will give us "fake" difference which
// forces Cassandra to send ReadRepair to a given source
versions.add(null);
versionSources.add(source);
}
}
}
// ....
if (resolved != null)
repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
// ....
}
}
}
2. public class RowRepairResolver extends AbstractRowResolver {
// ....
public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
{
List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
for (int i = 0; i < versions.size(); i++)
{
// Sooner or later we have to compare null and resolved which are obviously
// not equals, so it will fire a ReadRequest, however it is not needed here
ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
if (diffCf == null)
continue;
// ....
Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
NodeA has X.12
NodeB has X.12
which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair) or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC to the consistent state
If you are reading from the Index then sooner or later you will get TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3843:
--------------------------------------
Fix Version/s: 1.0.8
Assignee: Jonathan Ellis
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Priority: Critical
> Fix For: 1.0.8
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Philip Andronov (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Andronov updated CASSANDRA-3843:
---------------------------------------
Description:
During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
Code explanations:
{code:title=RangeSliceResponseResolver.java|borderStyle=solid}
class RangeSliceResponseResolver {
// ....
private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
{
// ....
protected Row getReduced()
{
ColumnFamily resolved = versions.size() > 1
? RowRepairResolver.resolveSuperset(versions)
: versions.get(0);
if (versions.size() < sources.size())
{
for (InetAddress source : sources)
{
if (!versionSources.contains(source))
{
// [PA] Here we are adding null ColumnFamily.
// later it will be compared with the "desired"
// version and will give us "fake" difference which
// forces Cassandra to send ReadRepair to a given source
versions.add(null);
versionSources.add(source);
}
}
}
// ....
if (resolved != null)
repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
// ....
}
}
}
{code}
{code:title=RowRepairResolver.java|borderStyle=solid}
public class RowRepairResolver extends AbstractRowResolver {
// ....
public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
{
List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
for (int i = 0; i < versions.size(); i++)
{
// Sooner or later we have to compare null and resolved which are obviously
// not equals, so it will fire a ReadRequest, however it is not needed here
ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
if (diffCf == null)
continue;
// ....
{code}
Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
NodeA has X.12
NodeB has X.12
which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair) or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC to the consistent state
If you are reading from the Index then sooner or later you will get TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
was:
During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
Code explanations:
class RangeSliceResponseResolver {
// ....
private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
{
// ....
protected Row getReduced()
{
ColumnFamily resolved = versions.size() > 1
? RowRepairResolver.resolveSuperset(versions)
: versions.get(0);
if (versions.size() < sources.size())
{
for (InetAddress source : sources)
{
if (!versionSources.contains(source))
{
// [PA] Here we are adding null ColumnFamily.
// later it will be compared with the "desired"
// version and will give us "fake" difference which
// forces Cassandra to send ReadRepair to a given source
versions.add(null);
versionSources.add(source);
}
}
}
// ....
if (resolved != null)
repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
// ....
}
}
}
2. public class RowRepairResolver extends AbstractRowResolver {
// ....
public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
{
List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
for (int i = 0; i < versions.size(); i++)
{
// Sooner or later we have to compare null and resolved which are obviously
// not equals, so it will fire a ReadRequest, however it is not needed here
ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
if (diffCf == null)
continue;
// ....
Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
NodeA has X.12
NodeB has X.12
which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair) or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC to the consistent state
If you are reading from the Index then sooner or later you will get TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Priority: Critical
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // Sooner or later we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair) or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC to the consistent state
> If you are reading from the Index then sooner or later you will get TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3843:
--------------------------------------
Attachment: 3843.txt
The null version was added for CASSANDRA-2680. I think the core problem here is that the RSRR is being created with *all* the replica endpoints ({{liveEndpoints}}), not just the ones being asked to respond to the query ({{handler.endpoints}}). This is a bit tricky since the handler wants to know its resolver at creation time, and unlike RowDigestResolver, RSRR wants to initialize its endpoints at creation time too. Kind of hackish patch attached.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3843:
--------------------------------------
Reviewer: vijay2win@yahoo.com
Priority: Major (was: Critical)
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212617#comment-13212617 ]
Jeremy Hanna commented on CASSANDRA-3843:
-----------------------------------------
I did repairs on all the nodes and then compacts on all the nodes. Then I did a pig job to simply count the number of rows in the column family. Again I think the overall writes were reduced but there are writes going on. I need to turn debug on and do the same test again. I did the compactions at 6:42 and the range scans at 14:16:
-rw-r--r-- 1 root root 40106228511 Feb 21 06:42 account_snapshot-g-792-Data.db
-rw-r--r-- 1 root root 206884816 Feb 21 06:42 account_snapshot-g-792-Filter.db
-rw-r--r-- 1 root root 2913796038 Feb 21 06:42 account_snapshot-g-792-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 06:42 account_snapshot-g-792-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-793-Compacted
-rw-r--r-- 1 root root 287286 Feb 21 14:16 account_snapshot-g-793-Data.db
-rw-r--r-- 1 root root 976 Feb 21 14:16 account_snapshot-g-793-Filter.db
-rw-r--r-- 1 root root 20857 Feb 21 14:16 account_snapshot-g-793-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:16 account_snapshot-g-793-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-794-Compacted
-rw-r--r-- 1 root root 87770771 Feb 21 14:17 account_snapshot-g-794-Data.db
-rw-r--r-- 1 root root 293944 Feb 21 14:17 account_snapshot-g-794-Filter.db
-rw-r--r-- 1 root root 6377968 Feb 21 14:17 account_snapshot-g-794-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-794-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-795-Compacted
-rw-r--r-- 1 root root 78459166 Feb 21 14:17 account_snapshot-g-795-Data.db
-rw-r--r-- 1 root root 262600 Feb 21 14:17 account_snapshot-g-795-Filter.db
-rw-r--r-- 1 root root 5698156 Feb 21 14:17 account_snapshot-g-795-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-795-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-796-Compacted
-rw-r--r-- 1 root root 69838937 Feb 21 14:17 account_snapshot-g-796-Data.db
-rw-r--r-- 1 root root 234000 Feb 21 14:17 account_snapshot-g-796-Filter.db
-rw-r--r-- 1 root root 5077447 Feb 21 14:17 account_snapshot-g-796-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-796-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-797-Compacted
-rw-r--r-- 1 root root 68094433 Feb 21 14:17 account_snapshot-g-797-Data.db
-rw-r--r-- 1 root root 227808 Feb 21 14:17 account_snapshot-g-797-Filter.db
-rw-r--r-- 1 root root 4943098 Feb 21 14:17 account_snapshot-g-797-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-797-Statistics.db
-rw-r--r-- 1 root root 304163307 Feb 21 14:20 account_snapshot-g-798-Data.db
-rw-r--r-- 1 root root 1019776 Feb 21 14:20 account_snapshot-g-798-Filter.db
-rw-r--r-- 1 root root 22096669 Feb 21 14:20 account_snapshot-g-798-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-798-Statistics.db
-rw-r--r-- 1 root root 65874829 Feb 21 14:18 account_snapshot-g-799-Data.db
-rw-r--r-- 1 root root 220192 Feb 21 14:18 account_snapshot-g-799-Filter.db
-rw-r--r-- 1 root root 4777809 Feb 21 14:18 account_snapshot-g-799-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-799-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-800-Compacted
-rw-r--r-- 1 root root 50067413 Feb 21 14:18 account_snapshot-g-800-Data.db
-rw-r--r-- 1 root root 167416 Feb 21 14:18 account_snapshot-g-800-Filter.db
-rw-r--r-- 1 root root 3632313 Feb 21 14:18 account_snapshot-g-800-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-800-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-801-Compacted
-rw-r--r-- 1 root root 50575719 Feb 21 14:18 account_snapshot-g-801-Data.db
-rw-r--r-- 1 root root 169160 Feb 21 14:18 account_snapshot-g-801-Filter.db
-rw-r--r-- 1 root root 3669880 Feb 21 14:18 account_snapshot-g-801-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-801-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-802-Compacted
-rw-r--r-- 1 root root 41788766 Feb 21 14:19 account_snapshot-g-802-Data.db
-rw-r--r-- 1 root root 139776 Feb 21 14:19 account_snapshot-g-802-Filter.db
-rw-r--r-- 1 root root 3033069 Feb 21 14:19 account_snapshot-g-802-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-802-Statistics.db
-rw-r--r-- 1 root root 46547146 Feb 21 14:19 account_snapshot-g-803-Data.db
-rw-r--r-- 1 root root 155720 Feb 21 14:19 account_snapshot-g-803-Filter.db
-rw-r--r-- 1 root root 3378457 Feb 21 14:19 account_snapshot-g-803-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-803-Statistics.db
-rw-r--r-- 1 root root 142719184 Feb 21 14:20 account_snapshot-g-804-Data.db
-rw-r--r-- 1 root root 478576 Feb 21 14:19 account_snapshot-g-804-Filter.db
-rw-r--r-- 1 root root 10356119 Feb 21 14:20 account_snapshot-g-804-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-804-Statistics.db
-rw-r--r-- 1 root root 55373874 Feb 21 14:19 account_snapshot-g-805-Data.db
-rw-r--r-- 1 root root 185160 Feb 21 14:19 account_snapshot-g-805-Filter.db
-rw-r--r-- 1 root root 4017391 Feb 21 14:19 account_snapshot-g-805-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-805-Statistics.db
-rw-r--r-- 1 root root 46399227 Feb 21 14:19 account_snapshot-g-806-Data.db
-rw-r--r-- 1 root root 155120 Feb 21 14:19 account_snapshot-g-806-Filter.db
-rw-r--r-- 1 root root 3365947 Feb 21 14:19 account_snapshot-g-806-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-806-Statistics.db
-rw-r--r-- 1 root root 58491393 Feb 21 14:19 account_snapshot-g-807-Data.db
-rw-r--r-- 1 root root 196048 Feb 21 14:19 account_snapshot-g-807-Filter.db
-rw-r--r-- 1 root root 4253922 Feb 21 14:19 account_snapshot-g-807-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-807-Statistics.db
-rw-r--r-- 1 root root 47609635 Feb 21 14:20 account_snapshot-g-808-Data.db
-rw-r--r-- 1 root root 159320 Feb 21 14:20 account_snapshot-g-808-Filter.db
-rw-r--r-- 1 root root 3456985 Feb 21 14:20 account_snapshot-g-808-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-808-Statistics.db
-rw-r--r-- 1 root root 46923060 Feb 21 14:20 account_snapshot-tmp-g-809-Data.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-tmp-g-809-Index.db
-rw-r--r-- 1 root root 49693602 Feb 21 14:20 account_snapshot-tmp-g-810-Data.db
-rw-r--r-- 1 root root 166600 Feb 21 14:20 account_snapshot-tmp-g-810-Filter.db
-rw-r--r-- 1 root root 3614750 Feb 21 14:20 account_snapshot-tmp-g-810-Index.db
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212725#comment-13212725 ]
Jeremy Hanna commented on CASSANDRA-3843:
-----------------------------------------
Good to know - we'll upgrade to 1.0.8 as soon as we can then.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204269#comment-13204269 ]
Vijay commented on CASSANDRA-3843:
----------------------------------
+1
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212679#comment-13212679 ]
Brandon Williams commented on CASSANDRA-3843:
---------------------------------------------
I'm unable to repro against 1.0 HEAD.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Philip Andronov (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Andronov updated CASSANDRA-3843:
---------------------------------------
Description:
During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
Code explanations:
{code:title=RangeSliceResponseResolver.java|borderStyle=solid}
class RangeSliceResponseResolver {
// ....
private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
{
// ....
protected Row getReduced()
{
ColumnFamily resolved = versions.size() > 1
? RowRepairResolver.resolveSuperset(versions)
: versions.get(0);
if (versions.size() < sources.size())
{
for (InetAddress source : sources)
{
if (!versionSources.contains(source))
{
// [PA] Here we are adding null ColumnFamily.
// later it will be compared with the "desired"
// version and will give us "fake" difference which
// forces Cassandra to send ReadRepair to a given source
versions.add(null);
versionSources.add(source);
}
}
}
// ....
if (resolved != null)
repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
// ....
}
}
}
{code}
{code:title=RowRepairResolver.java|borderStyle=solid}
public class RowRepairResolver extends AbstractRowResolver {
// ....
public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
{
List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
for (int i = 0; i < versions.size(); i++)
{
// On some iteration we have to compare null and resolved which are obviously
// not equals, so it will fire a ReadRequest, however it is not needed here
ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
if (diffCf == null)
continue;
// ....
{code}
Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
NodeA has X.12
NodeB has X.12
which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
was:
During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
Code explanations:
{code:title=RangeSliceResponseResolver.java|borderStyle=solid}
class RangeSliceResponseResolver {
// ....
private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
{
// ....
protected Row getReduced()
{
ColumnFamily resolved = versions.size() > 1
? RowRepairResolver.resolveSuperset(versions)
: versions.get(0);
if (versions.size() < sources.size())
{
for (InetAddress source : sources)
{
if (!versionSources.contains(source))
{
// [PA] Here we are adding null ColumnFamily.
// later it will be compared with the "desired"
// version and will give us "fake" difference which
// forces Cassandra to send ReadRepair to a given source
versions.add(null);
versionSources.add(source);
}
}
}
// ....
if (resolved != null)
repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
// ....
}
}
}
{code}
{code:title=RowRepairResolver.java|borderStyle=solid}
public class RowRepairResolver extends AbstractRowResolver {
// ....
public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
{
List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
for (int i = 0; i < versions.size(); i++)
{
// Sooner or later we have to compare null and resolved which are obviously
// not equals, so it will fire a ReadRequest, however it is not needed here
ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
if (diffCf == null)
continue;
// ....
{code}
Imagine the following situation:
NodeA has X.1 // row X with the version 1
NodeB has X.2
NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
NodeA has X.12
NodeB has X.12
which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes A/B (no ReadRepair) or by any pair with node C, but in that case ReadRepair will be fire which will brings nodeC to the consistent state
If you are reading from the Index then sooner or later you will get TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Priority: Critical
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207827#comment-13207827 ]
Jonathan Ellis commented on CASSANDRA-3843:
-------------------------------------------
I suggest testing with a single range scan at debug level. Too much hay to see the needle when you're doing 100s or 1000s of scans.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3843) Unnecessary
ReadRepair request during RangeScan
Posted by "Jeremy Hanna (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212617#comment-13212617 ]
Jeremy Hanna edited comment on CASSANDRA-3843 at 2/21/12 2:30 PM:
------------------------------------------------------------------
I did repairs on all the nodes and then compacts on all the nodes. Then I did a pig job to simply count the number of rows in the column family. Again I think the overall writes were reduced but there are writes going on. I need to turn debug on and do the same test again. I did the compactions at 6:42 and the range scans at 14:16:
{code}
-rw-r--r-- 1 root root 40106228511 Feb 21 06:42 account_snapshot-g-792-Data.db
-rw-r--r-- 1 root root 206884816 Feb 21 06:42 account_snapshot-g-792-Filter.db
-rw-r--r-- 1 root root 2913796038 Feb 21 06:42 account_snapshot-g-792-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 06:42 account_snapshot-g-792-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-793-Compacted
-rw-r--r-- 1 root root 287286 Feb 21 14:16 account_snapshot-g-793-Data.db
-rw-r--r-- 1 root root 976 Feb 21 14:16 account_snapshot-g-793-Filter.db
-rw-r--r-- 1 root root 20857 Feb 21 14:16 account_snapshot-g-793-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:16 account_snapshot-g-793-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-794-Compacted
-rw-r--r-- 1 root root 87770771 Feb 21 14:17 account_snapshot-g-794-Data.db
-rw-r--r-- 1 root root 293944 Feb 21 14:17 account_snapshot-g-794-Filter.db
-rw-r--r-- 1 root root 6377968 Feb 21 14:17 account_snapshot-g-794-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-794-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-795-Compacted
-rw-r--r-- 1 root root 78459166 Feb 21 14:17 account_snapshot-g-795-Data.db
-rw-r--r-- 1 root root 262600 Feb 21 14:17 account_snapshot-g-795-Filter.db
-rw-r--r-- 1 root root 5698156 Feb 21 14:17 account_snapshot-g-795-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-795-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-796-Compacted
-rw-r--r-- 1 root root 69838937 Feb 21 14:17 account_snapshot-g-796-Data.db
-rw-r--r-- 1 root root 234000 Feb 21 14:17 account_snapshot-g-796-Filter.db
-rw-r--r-- 1 root root 5077447 Feb 21 14:17 account_snapshot-g-796-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-796-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-797-Compacted
-rw-r--r-- 1 root root 68094433 Feb 21 14:17 account_snapshot-g-797-Data.db
-rw-r--r-- 1 root root 227808 Feb 21 14:17 account_snapshot-g-797-Filter.db
-rw-r--r-- 1 root root 4943098 Feb 21 14:17 account_snapshot-g-797-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-797-Statistics.db
-rw-r--r-- 1 root root 304163307 Feb 21 14:20 account_snapshot-g-798-Data.db
-rw-r--r-- 1 root root 1019776 Feb 21 14:20 account_snapshot-g-798-Filter.db
-rw-r--r-- 1 root root 22096669 Feb 21 14:20 account_snapshot-g-798-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-798-Statistics.db
-rw-r--r-- 1 root root 65874829 Feb 21 14:18 account_snapshot-g-799-Data.db
-rw-r--r-- 1 root root 220192 Feb 21 14:18 account_snapshot-g-799-Filter.db
-rw-r--r-- 1 root root 4777809 Feb 21 14:18 account_snapshot-g-799-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-799-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-800-Compacted
-rw-r--r-- 1 root root 50067413 Feb 21 14:18 account_snapshot-g-800-Data.db
-rw-r--r-- 1 root root 167416 Feb 21 14:18 account_snapshot-g-800-Filter.db
-rw-r--r-- 1 root root 3632313 Feb 21 14:18 account_snapshot-g-800-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-800-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-801-Compacted
-rw-r--r-- 1 root root 50575719 Feb 21 14:18 account_snapshot-g-801-Data.db
-rw-r--r-- 1 root root 169160 Feb 21 14:18 account_snapshot-g-801-Filter.db
-rw-r--r-- 1 root root 3669880 Feb 21 14:18 account_snapshot-g-801-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-801-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-802-Compacted
-rw-r--r-- 1 root root 41788766 Feb 21 14:19 account_snapshot-g-802-Data.db
-rw-r--r-- 1 root root 139776 Feb 21 14:19 account_snapshot-g-802-Filter.db
-rw-r--r-- 1 root root 3033069 Feb 21 14:19 account_snapshot-g-802-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-802-Statistics.db
-rw-r--r-- 1 root root 46547146 Feb 21 14:19 account_snapshot-g-803-Data.db
-rw-r--r-- 1 root root 155720 Feb 21 14:19 account_snapshot-g-803-Filter.db
-rw-r--r-- 1 root root 3378457 Feb 21 14:19 account_snapshot-g-803-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-803-Statistics.db
-rw-r--r-- 1 root root 142719184 Feb 21 14:20 account_snapshot-g-804-Data.db
-rw-r--r-- 1 root root 478576 Feb 21 14:19 account_snapshot-g-804-Filter.db
-rw-r--r-- 1 root root 10356119 Feb 21 14:20 account_snapshot-g-804-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-804-Statistics.db
-rw-r--r-- 1 root root 55373874 Feb 21 14:19 account_snapshot-g-805-Data.db
-rw-r--r-- 1 root root 185160 Feb 21 14:19 account_snapshot-g-805-Filter.db
-rw-r--r-- 1 root root 4017391 Feb 21 14:19 account_snapshot-g-805-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-805-Statistics.db
-rw-r--r-- 1 root root 46399227 Feb 21 14:19 account_snapshot-g-806-Data.db
-rw-r--r-- 1 root root 155120 Feb 21 14:19 account_snapshot-g-806-Filter.db
-rw-r--r-- 1 root root 3365947 Feb 21 14:19 account_snapshot-g-806-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-806-Statistics.db
-rw-r--r-- 1 root root 58491393 Feb 21 14:19 account_snapshot-g-807-Data.db
-rw-r--r-- 1 root root 196048 Feb 21 14:19 account_snapshot-g-807-Filter.db
-rw-r--r-- 1 root root 4253922 Feb 21 14:19 account_snapshot-g-807-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-807-Statistics.db
-rw-r--r-- 1 root root 47609635 Feb 21 14:20 account_snapshot-g-808-Data.db
-rw-r--r-- 1 root root 159320 Feb 21 14:20 account_snapshot-g-808-Filter.db
-rw-r--r-- 1 root root 3456985 Feb 21 14:20 account_snapshot-g-808-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-808-Statistics.db
-rw-r--r-- 1 root root 46923060 Feb 21 14:20 account_snapshot-tmp-g-809-Data.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-tmp-g-809-Index.db
-rw-r--r-- 1 root root 49693602 Feb 21 14:20 account_snapshot-tmp-g-810-Data.db
-rw-r--r-- 1 root root 166600 Feb 21 14:20 account_snapshot-tmp-g-810-Filter.db
-rw-r--r-- 1 root root 3614750 Feb 21 14:20 account_snapshot-tmp-g-810-Index.db
{code}
was (Author: jeromatron):
I did repairs on all the nodes and then compacts on all the nodes. Then I did a pig job to simply count the number of rows in the column family. Again I think the overall writes were reduced but there are writes going on. I need to turn debug on and do the same test again. I did the compactions at 6:42 and the range scans at 14:16:
-rw-r--r-- 1 root root 40106228511 Feb 21 06:42 account_snapshot-g-792-Data.db
-rw-r--r-- 1 root root 206884816 Feb 21 06:42 account_snapshot-g-792-Filter.db
-rw-r--r-- 1 root root 2913796038 Feb 21 06:42 account_snapshot-g-792-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 06:42 account_snapshot-g-792-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-793-Compacted
-rw-r--r-- 1 root root 287286 Feb 21 14:16 account_snapshot-g-793-Data.db
-rw-r--r-- 1 root root 976 Feb 21 14:16 account_snapshot-g-793-Filter.db
-rw-r--r-- 1 root root 20857 Feb 21 14:16 account_snapshot-g-793-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:16 account_snapshot-g-793-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-794-Compacted
-rw-r--r-- 1 root root 87770771 Feb 21 14:17 account_snapshot-g-794-Data.db
-rw-r--r-- 1 root root 293944 Feb 21 14:17 account_snapshot-g-794-Filter.db
-rw-r--r-- 1 root root 6377968 Feb 21 14:17 account_snapshot-g-794-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-794-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-795-Compacted
-rw-r--r-- 1 root root 78459166 Feb 21 14:17 account_snapshot-g-795-Data.db
-rw-r--r-- 1 root root 262600 Feb 21 14:17 account_snapshot-g-795-Filter.db
-rw-r--r-- 1 root root 5698156 Feb 21 14:17 account_snapshot-g-795-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-795-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-796-Compacted
-rw-r--r-- 1 root root 69838937 Feb 21 14:17 account_snapshot-g-796-Data.db
-rw-r--r-- 1 root root 234000 Feb 21 14:17 account_snapshot-g-796-Filter.db
-rw-r--r-- 1 root root 5077447 Feb 21 14:17 account_snapshot-g-796-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-796-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-797-Compacted
-rw-r--r-- 1 root root 68094433 Feb 21 14:17 account_snapshot-g-797-Data.db
-rw-r--r-- 1 root root 227808 Feb 21 14:17 account_snapshot-g-797-Filter.db
-rw-r--r-- 1 root root 4943098 Feb 21 14:17 account_snapshot-g-797-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:17 account_snapshot-g-797-Statistics.db
-rw-r--r-- 1 root root 304163307 Feb 21 14:20 account_snapshot-g-798-Data.db
-rw-r--r-- 1 root root 1019776 Feb 21 14:20 account_snapshot-g-798-Filter.db
-rw-r--r-- 1 root root 22096669 Feb 21 14:20 account_snapshot-g-798-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-798-Statistics.db
-rw-r--r-- 1 root root 65874829 Feb 21 14:18 account_snapshot-g-799-Data.db
-rw-r--r-- 1 root root 220192 Feb 21 14:18 account_snapshot-g-799-Filter.db
-rw-r--r-- 1 root root 4777809 Feb 21 14:18 account_snapshot-g-799-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-799-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-800-Compacted
-rw-r--r-- 1 root root 50067413 Feb 21 14:18 account_snapshot-g-800-Data.db
-rw-r--r-- 1 root root 167416 Feb 21 14:18 account_snapshot-g-800-Filter.db
-rw-r--r-- 1 root root 3632313 Feb 21 14:18 account_snapshot-g-800-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-800-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-801-Compacted
-rw-r--r-- 1 root root 50575719 Feb 21 14:18 account_snapshot-g-801-Data.db
-rw-r--r-- 1 root root 169160 Feb 21 14:18 account_snapshot-g-801-Filter.db
-rw-r--r-- 1 root root 3669880 Feb 21 14:18 account_snapshot-g-801-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:18 account_snapshot-g-801-Statistics.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-802-Compacted
-rw-r--r-- 1 root root 41788766 Feb 21 14:19 account_snapshot-g-802-Data.db
-rw-r--r-- 1 root root 139776 Feb 21 14:19 account_snapshot-g-802-Filter.db
-rw-r--r-- 1 root root 3033069 Feb 21 14:19 account_snapshot-g-802-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-802-Statistics.db
-rw-r--r-- 1 root root 46547146 Feb 21 14:19 account_snapshot-g-803-Data.db
-rw-r--r-- 1 root root 155720 Feb 21 14:19 account_snapshot-g-803-Filter.db
-rw-r--r-- 1 root root 3378457 Feb 21 14:19 account_snapshot-g-803-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-803-Statistics.db
-rw-r--r-- 1 root root 142719184 Feb 21 14:20 account_snapshot-g-804-Data.db
-rw-r--r-- 1 root root 478576 Feb 21 14:19 account_snapshot-g-804-Filter.db
-rw-r--r-- 1 root root 10356119 Feb 21 14:20 account_snapshot-g-804-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-804-Statistics.db
-rw-r--r-- 1 root root 55373874 Feb 21 14:19 account_snapshot-g-805-Data.db
-rw-r--r-- 1 root root 185160 Feb 21 14:19 account_snapshot-g-805-Filter.db
-rw-r--r-- 1 root root 4017391 Feb 21 14:19 account_snapshot-g-805-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-805-Statistics.db
-rw-r--r-- 1 root root 46399227 Feb 21 14:19 account_snapshot-g-806-Data.db
-rw-r--r-- 1 root root 155120 Feb 21 14:19 account_snapshot-g-806-Filter.db
-rw-r--r-- 1 root root 3365947 Feb 21 14:19 account_snapshot-g-806-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-806-Statistics.db
-rw-r--r-- 1 root root 58491393 Feb 21 14:19 account_snapshot-g-807-Data.db
-rw-r--r-- 1 root root 196048 Feb 21 14:19 account_snapshot-g-807-Filter.db
-rw-r--r-- 1 root root 4253922 Feb 21 14:19 account_snapshot-g-807-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:19 account_snapshot-g-807-Statistics.db
-rw-r--r-- 1 root root 47609635 Feb 21 14:20 account_snapshot-g-808-Data.db
-rw-r--r-- 1 root root 159320 Feb 21 14:20 account_snapshot-g-808-Filter.db
-rw-r--r-- 1 root root 3456985 Feb 21 14:20 account_snapshot-g-808-Index.db
-rw-r--r-- 1 root root 4276 Feb 21 14:20 account_snapshot-g-808-Statistics.db
-rw-r--r-- 1 root root 46923060 Feb 21 14:20 account_snapshot-tmp-g-809-Data.db
-rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-tmp-g-809-Index.db
-rw-r--r-- 1 root root 49693602 Feb 21 14:20 account_snapshot-tmp-g-810-Data.db
-rw-r--r-- 1 root root 166600 Feb 21 14:20 account_snapshot-tmp-g-810-Filter.db
-rw-r--r-- 1 root root 3614750 Feb 21 14:20 account_snapshot-tmp-g-810-Index.db
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207046#comment-13207046 ]
Jonathan Ellis commented on CASSANDRA-3843:
-------------------------------------------
It's a relatively small patch, but StorageProxy and its callbacks can be fragile... I almost didn't commit it to 1.0 either. Tell you what though, I'll post a backported patch here and if you want you can run with it. :)
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207832#comment-13207832 ]
Jeremy Hanna commented on CASSANDRA-3843:
-----------------------------------------
I did patch with v2. Doing more testing today and it appears that there are writes occurring but it looks like a definite reduction. It could be a valid repair thing. I'll do some more testing and hopefully repair every node and compact every node and then do a scan across a large column family and see what happens.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206881#comment-13206881 ]
Jeremy Hanna commented on CASSANDRA-3843:
-----------------------------------------
We'll be upgrading to 1.0.8 as soon as we can, but this seems like a significant issue for anyone doing range scans - does it make sense to backport to 0.8.x?
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-3843.
---------------------------------------
Resolution: Fixed
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3843) Unnecessary
ReadRepair request during RangeScan
Posted by "Jeremy Hanna (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212725#comment-13212725 ]
Jeremy Hanna edited comment on CASSANDRA-3843 at 2/21/12 4:48 PM:
------------------------------------------------------------------
Thanks - good to know - we'll upgrade to 1.0.8 as soon as we can then.
was (Author: jeromatron):
Good to know - we'll upgrade to 1.0.8 as soon as we can then.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3843:
--------------------------------------
Attachment: 3843-v2.txt
committed, with the same fix for the 2nd occurrence of the RSRR in 1.0 StorageProxy. v2 attached in case that makes it easyer for anyone to test.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207050#comment-13207050 ]
Jonathan Ellis commented on CASSANDRA-3843:
-------------------------------------------
Looks to me like the 1.0 code changes from v2 apply cleanly to 0.8. (CHANGES diff does not apply but can be ignored.)
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Philip Andronov (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204600#comment-13204600 ]
Philip Andronov commented on CASSANDRA-3843:
--------------------------------------------
> The null version was added for CASSANDRA-2680.
Oh, good point. Sorry, I've should pay more attention on git history, not only on annotations :)
Anyway, thanks for the patch, now we could apply correct patch on our servers.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207828#comment-13207828 ]
Jonathan Ellis commented on CASSANDRA-3843:
-------------------------------------------
... You did patch with v2, right?
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request
during RangeScan
Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207577#comment-13207577 ]
Jeremy Hanna commented on CASSANDRA-3843:
-----------------------------------------
I patched the version of 0.8.4 that we use with the change. I applied it to all of our staging nodes. However, the problem with writes on the column family it was simply doing range scans of still persists. I had major compacted a column family on all of the nodes, then did a simple pig job to read the contents of that CF, then I got a lot of minor compactions for that column family.
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ? RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey<?> key, List<ColumnFamily> versions, List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which are obviously
> // not equals, so it will fire a ReadRequest, however it is not needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira