You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Schubert Zhang <zs...@gmail.com> on 2010/11/11 09:19:22 UTC
MerkleTree.RowHash maybe a bug.
Hi JE,
0.6.6:
org.apache.cassandra.service.AntiEntropyService
I found the rowHash method uses "row.buffer.getData()" directly.
Since row.buffer.getData() is a byte[], and there may have some junk bytes
in the end by the buffer, I think we should use the exact length.
private MerkleTree.RowHash rowHash(CompactedRow row)
{
validated++;
// MerkleTree uses XOR internally, so we want lots of output
bits here
byte[] rowhash = FBUtilities.hash("SHA-256",
row.key.key.getBytes(), row.buffer.getData());
return new MerkleTree.RowHash(row.key.token, rowhash);
}
schubert.zhang@gmail.com
Re: MerkleTree.RowHash maybe a bug.
Posted by Schubert Zhang <zs...@gmail.com>.
Hi Stu Hood,
Yes, it may not result in extra repair, since the excess bytes of the Buffer
may be same on different machine. e.g: all 0 bytes.
But it depends on how the JDK (ByteArrayOutputStream) allocate memory, it is
a risk for different JDK version.
In fact, we have added compression feature into cassandra-0.6 in our
product, and more than one buffer objects use here, it result in problem.
Schubert
On Fri, Nov 12, 2010 at 1:31 AM, Stu Hood <st...@rackspace.com> wrote:
> At first glance, this appeared to be a very egregious bug, but the effect
> is actually minimal: since the size of the buffer is deterministic based on
> the size of the data, you will have equal amounts of excess/junk data for
> equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I
> don't think we're actually doing any extra repair.
>
> The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in
> 0.6, in case we start reusing row buffers.
>
> Thanks for the report!
> Stu
>
> -----Original Message-----
> From: "Schubert Zhang" <zs...@gmail.com>
> Sent: Thursday, November 11, 2010 2:19am
> To: dev@cassandra.apache.org, user@cassandra.apache.org
> Subject: MerkleTree.RowHash maybe a bug.
>
> Hi JE,
>
> 0.6.6:
> org.apache.cassandra.service.AntiEntropyService
>
> I found the rowHash method uses "row.buffer.getData()" directly.
> Since row.buffer.getData() is a byte[], and there may have some junk bytes
> in the end by the buffer, I think we should use the exact length.
>
> private MerkleTree.RowHash rowHash(CompactedRow row)
> {
> validated++;
> // MerkleTree uses XOR internally, so we want lots of output
> bits here
> byte[] rowhash = FBUtilities.hash("SHA-256",
> row.key.key.getBytes(), row.buffer.getData());
> return new MerkleTree.RowHash(row.key.token, rowhash);
> }
>
>
> schubert.zhang@gmail.com
>
>
>
RE: MerkleTree.RowHash maybe a bug.
Posted by Stu Hood <st...@rackspace.com>.
At first glance, this appeared to be a very egregious bug, but the effect is actually minimal: since the size of the buffer is deterministic based on the size of the data, you will have equal amounts of excess/junk data for equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I don't think we're actually doing any extra repair.
The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in 0.6, in case we start reusing row buffers.
Thanks for the report!
Stu
-----Original Message-----
From: "Schubert Zhang" <zs...@gmail.com>
Sent: Thursday, November 11, 2010 2:19am
To: dev@cassandra.apache.org, user@cassandra.apache.org
Subject: MerkleTree.RowHash maybe a bug.
Hi JE,
0.6.6:
org.apache.cassandra.service.AntiEntropyService
I found the rowHash method uses "row.buffer.getData()" directly.
Since row.buffer.getData() is a byte[], and there may have some junk bytes
in the end by the buffer, I think we should use the exact length.
private MerkleTree.RowHash rowHash(CompactedRow row)
{
validated++;
// MerkleTree uses XOR internally, so we want lots of output
bits here
byte[] rowhash = FBUtilities.hash("SHA-256",
row.key.key.getBytes(), row.buffer.getData());
return new MerkleTree.RowHash(row.key.token, rowhash);
}
schubert.zhang@gmail.com
RE: MerkleTree.RowHash maybe a bug.
Posted by Stu Hood <st...@rackspace.com>.
At first glance, this appeared to be a very egregious bug, but the effect is actually minimal: since the size of the buffer is deterministic based on the size of the data, you will have equal amounts of excess/junk data for equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I don't think we're actually doing any extra repair.
The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in 0.6, in case we start reusing row buffers.
Thanks for the report!
Stu
-----Original Message-----
From: "Schubert Zhang" <zs...@gmail.com>
Sent: Thursday, November 11, 2010 2:19am
To: dev@cassandra.apache.org, user@cassandra.apache.org
Subject: MerkleTree.RowHash maybe a bug.
Hi JE,
0.6.6:
org.apache.cassandra.service.AntiEntropyService
I found the rowHash method uses "row.buffer.getData()" directly.
Since row.buffer.getData() is a byte[], and there may have some junk bytes
in the end by the buffer, I think we should use the exact length.
private MerkleTree.RowHash rowHash(CompactedRow row)
{
validated++;
// MerkleTree uses XOR internally, so we want lots of output
bits here
byte[] rowhash = FBUtilities.hash("SHA-256",
row.key.key.getBytes(), row.buffer.getData());
return new MerkleTree.RowHash(row.key.token, rowhash);
}
schubert.zhang@gmail.com