You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Schubert Zhang <zs...@gmail.com> on 2010/11/11 09:19:22 UTC

MerkleTree.RowHash maybe a bug.

Hi JE,

0.6.6:
org.apache.cassandra.service.AntiEntropyService

I found the rowHash method uses "row.buffer.getData()" directly.
Since row.buffer.getData()  is a byte[], and there may have some junk bytes
in the end by the buffer, I think we should use the exact length.

        private MerkleTree.RowHash rowHash(CompactedRow row)
        {
            validated++;
            // MerkleTree uses XOR internally, so we want lots of output
bits here
            byte[] rowhash = FBUtilities.hash("SHA-256",
row.key.key.getBytes(), row.buffer.getData());
            return new MerkleTree.RowHash(row.key.token, rowhash);
        }


schubert.zhang@gmail.com

Re: MerkleTree.RowHash maybe a bug.

Posted by Schubert Zhang <zs...@gmail.com>.

Hi Stu Hood,

Yes, it may not result in extra repair, since the excess bytes of the Buffer
may be same on different machine. e.g: all 0 bytes.
But it depends on how the JDK (ByteArrayOutputStream) allocate memory, it is
a risk for different JDK version.

In fact, we have added compression feature into cassandra-0.6 in our
product, and more than one buffer objects use here, it result in problem.
Schubert
On Fri, Nov 12, 2010 at 1:31 AM, Stu Hood <st...@rackspace.com> wrote:

> At first glance, this appeared to be a very egregious bug, but the effect
> is actually minimal: since the size of the buffer is deterministic based on
> the size of the data, you will have equal amounts of excess/junk data for
> equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I
> don't think we're actually doing any extra repair.
>
> The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in
> 0.6, in case we start reusing row buffers.
>
> Thanks for the report!
> Stu
>
> -----Original Message-----
> From: "Schubert Zhang" <zs...@gmail.com>
> Sent: Thursday, November 11, 2010 2:19am
> To: dev@cassandra.apache.org, user@cassandra.apache.org
> Subject: MerkleTree.RowHash maybe a bug.
>
> Hi JE,
>
> 0.6.6:
> org.apache.cassandra.service.AntiEntropyService
>
> I found the rowHash method uses "row.buffer.getData()" directly.
> Since row.buffer.getData()  is a byte[], and there may have some junk bytes
> in the end by the buffer, I think we should use the exact length.
>
>        private MerkleTree.RowHash rowHash(CompactedRow row)
>        {
>            validated++;
>            // MerkleTree uses XOR internally, so we want lots of output
> bits here
>            byte[] rowhash = FBUtilities.hash("SHA-256",
> row.key.key.getBytes(), row.buffer.getData());
>            return new MerkleTree.RowHash(row.key.token, rowhash);
>        }
>
>
> schubert.zhang@gmail.com
>
>
>

RE: MerkleTree.RowHash maybe a bug.

Posted by Stu Hood <st...@rackspace.com>.

At first glance, this appeared to be a very egregious bug, but the effect is actually minimal: since the size of the buffer is deterministic based on the size of the data, you will have equal amounts of excess/junk data for equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I don't think we're actually doing any extra repair.

The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in 0.6, in case we start reusing row buffers.

Thanks for the report!
Stu

-----Original Message-----
From: "Schubert Zhang" <zs...@gmail.com>
Sent: Thursday, November 11, 2010 2:19am
To: dev@cassandra.apache.org, user@cassandra.apache.org
Subject: MerkleTree.RowHash maybe a bug.

Hi JE,

0.6.6:
org.apache.cassandra.service.AntiEntropyService

I found the rowHash method uses "row.buffer.getData()" directly.
Since row.buffer.getData()  is a byte[], and there may have some junk bytes
in the end by the buffer, I think we should use the exact length.

        private MerkleTree.RowHash rowHash(CompactedRow row)
        {
            validated++;
            // MerkleTree uses XOR internally, so we want lots of output
bits here
            byte[] rowhash = FBUtilities.hash("SHA-256",
row.key.key.getBytes(), row.buffer.getData());
            return new MerkleTree.RowHash(row.key.token, rowhash);
        }


schubert.zhang@gmail.com

RE: MerkleTree.RowHash maybe a bug.

Posted by Stu Hood <st...@rackspace.com>.

At first glance, this appeared to be a very egregious bug, but the effect is actually minimal: since the size of the buffer is deterministic based on the size of the data, you will have equal amounts of excess/junk data for equal rows. Combined with the fact that 0.6 doesn't reuse these buffers, I don't think we're actually doing any extra repair.

The problem is fixed in 0.7, but I've opened CASSANDRA-1729 to fix it in 0.6, in case we start reusing row buffers.

Thanks for the report!
Stu

-----Original Message-----
From: "Schubert Zhang" <zs...@gmail.com>
Sent: Thursday, November 11, 2010 2:19am
To: dev@cassandra.apache.org, user@cassandra.apache.org
Subject: MerkleTree.RowHash maybe a bug.

Hi JE,

0.6.6:
org.apache.cassandra.service.AntiEntropyService

I found the rowHash method uses "row.buffer.getData()" directly.
Since row.buffer.getData()  is a byte[], and there may have some junk bytes
in the end by the buffer, I think we should use the exact length.

        private MerkleTree.RowHash rowHash(CompactedRow row)
        {
            validated++;
            // MerkleTree uses XOR internally, so we want lots of output
bits here
            byte[] rowhash = FBUtilities.hash("SHA-256",
row.key.key.getBytes(), row.buffer.getData());
            return new MerkleTree.RowHash(row.key.token, rowhash);
        }


schubert.zhang@gmail.com