You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/11/13 19:00:44 UTC

Bug In Text Class - getLength() and getBytes().length are different.

The following code (the TextTest class) displays:

cq: [5000000000000000]
cq: [16]
cq: [16]
cq: [5000000000000000]
cq: [16]
cq: [17]

You'll notice that the last two numbers are different, but they should
both be 16. This bug affects Accumulo because of the following code in
Mutation:

  private void put(byte b[]) {
    buffer.writeVLong(b.length);
    buffer.add(b, 0, b.length);
  }

  private void put(Text t) {
    buffer.writeVLong(t.getLength());
    buffer.add(t.getBytes(), 0, t.getLength());
  }

I should be able to call either of the following to get the same
result but I can't.

  put("5000000000000000".getBytes());
  put(new Text("5000000000000000"));

Has anyone else run into this issue? Any workarounds or fixes?

----

package com.codebits.accumulo;

import org.apache.hadoop.io.Text;

public class TextTest {

  public static void main(String[] args) {
    String s = "5000000000000000";
    System.out.println("cq: [" + s + "]");
    System.out.println("cq: [" + s.length() + "]");
    System.out.println("cq: [" + s.getBytes().length + "]");

    Text cq = new Text(s);
    System.out.println("cq: [" + cq + "]");
    System.out.println("cq: [" + cq.getLength() + "]");
    System.out.println("cq: [" + cq.getBytes().length + "]");
  }

}

Re: Bug In Text Class - getLength() and getBytes().length are different.

Posted by Keith Turner <ke...@deenlo.com>.
On Tue, Nov 13, 2012 at 1:00 PM, David Medinets
<da...@gmail.com> wrote:
> The following code (the TextTest class) displays:
>
> cq: [5000000000000000]
> cq: [16]
> cq: [16]
> cq: [5000000000000000]
> cq: [16]
> cq: [17]
>
> You'll notice that the last two numbers are different, but they should
> both be 16. This bug affects Accumulo because of the following code in

Its ok that its 17.  The byte array text uses may be longer than the
data.  Thats why its important to use Text.getLength()

> Mutation:
>
>   private void put(byte b[]) {
>     buffer.writeVLong(b.length);
>     buffer.add(b, 0, b.length);
>   }
>
>   private void put(Text t) {
>     buffer.writeVLong(t.getLength());
>     buffer.add(t.getBytes(), 0, t.getLength());
>   }
>
> I should be able to call either of the following to get the same
> result but I can't.
>
>   put("5000000000000000".getBytes());
>   put(new Text("5000000000000000"));
>
> Has anyone else run into this issue? Any workarounds or fixes?
>
> ----
>
> package com.codebits.accumulo;
>
> import org.apache.hadoop.io.Text;
>
> public class TextTest {
>
>   public static void main(String[] args) {
>     String s = "5000000000000000";
>     System.out.println("cq: [" + s + "]");
>     System.out.println("cq: [" + s.length() + "]");
>     System.out.println("cq: [" + s.getBytes().length + "]");
>
>     Text cq = new Text(s);
>     System.out.println("cq: [" + cq + "]");
>     System.out.println("cq: [" + cq.getLength() + "]");
>     System.out.println("cq: [" + cq.getBytes().length + "]");
>   }
>
> }

Re: Bug In Text Class - getLength() and getBytes().length are different.

Posted by Marc Parisi <ma...@accumulo.net>.
That's expected. As per
http://docs.oracle.com/javase/6/docs/api/java/nio/Buffer.html, the byte
buffer is created by the encoder, where there is no guarantee that the
backing array's length would match the position ( or limit ).

On Tue, Nov 13, 2012 at 1:00 PM, David Medinets <da...@gmail.com>wrote:

> The following code (the TextTest class) displays:
>
> cq: [5000000000000000]
> cq: [16]
> cq: [16]
> cq: [5000000000000000]
> cq: [16]
> cq: [17]
>
> You'll notice that the last two numbers are different, but they should
> both be 16. This bug affects Accumulo because of the following code in
> Mutation:
>
>   private void put(byte b[]) {
>     buffer.writeVLong(b.length);
>     buffer.add(b, 0, b.length);
>   }
>
>   private void put(Text t) {
>     buffer.writeVLong(t.getLength());
>     buffer.add(t.getBytes(), 0, t.getLength());
>   }
>
> I should be able to call either of the following to get the same
> result but I can't.
>
>   put("5000000000000000".getBytes());
>   put(new Text("5000000000000000"));
>
> Has anyone else run into this issue? Any workarounds or fixes?
>
> ----
>
> package com.codebits.accumulo;
>
> import org.apache.hadoop.io.Text;
>
> public class TextTest {
>
>   public static void main(String[] args) {
>     String s = "5000000000000000";
>     System.out.println("cq: [" + s + "]");
>     System.out.println("cq: [" + s.length() + "]");
>     System.out.println("cq: [" + s.getBytes().length + "]");
>
>     Text cq = new Text(s);
>     System.out.println("cq: [" + cq + "]");
>     System.out.println("cq: [" + cq.getLength() + "]");
>     System.out.println("cq: [" + cq.getBytes().length + "]");
>   }
>
> }
>

Re: Bug In Text Class - getLength() and getBytes().length are different.

Posted by David Medinets <da...@gmail.com>.
That makes sense. I have resolved my issue by passing the
Text.getLength() value along with Text.getBytes(). This works fine.
Thanks.

On Tue, Nov 13, 2012 at 1:23 PM, John Vines <vi...@apache.org> wrote:
> This is not a bug. A Text object is a reusable object which prevents
> repeated creation of byte arrays, so it will use the same byte array,
> resizing it if necessary, and writing over the previous values. Doing any
> operation based on Text.getBytes().length has a strong potential to provide
> inaccurate results. Text.getLength() is the appropriate way to get the
> length of the underlying byte array that you care about.
>
> John
>
>
> On Tue, Nov 13, 2012 at 1:00 PM, David Medinets <da...@gmail.com>wrote:
>
>> The following code (the TextTest class) displays:
>>
>> cq: [5000000000000000]
>> cq: [16]
>> cq: [16]
>> cq: [5000000000000000]
>> cq: [16]
>> cq: [17]
>>
>> You'll notice that the last two numbers are different, but they should
>> both be 16. This bug affects Accumulo because of the following code in
>> Mutation:
>>
>>   private void put(byte b[]) {
>>     buffer.writeVLong(b.length);
>>     buffer.add(b, 0, b.length);
>>   }
>>
>>   private void put(Text t) {
>>     buffer.writeVLong(t.getLength());
>>     buffer.add(t.getBytes(), 0, t.getLength());
>>   }
>>
>> I should be able to call either of the following to get the same
>> result but I can't.
>>
>>   put("5000000000000000".getBytes());
>>   put(new Text("5000000000000000"));
>>
>> Has anyone else run into this issue? Any workarounds or fixes?
>>
>> ----
>>
>> package com.codebits.accumulo;
>>
>> import org.apache.hadoop.io.Text;
>>
>> public class TextTest {
>>
>>   public static void main(String[] args) {
>>     String s = "5000000000000000";
>>     System.out.println("cq: [" + s + "]");
>>     System.out.println("cq: [" + s.length() + "]");
>>     System.out.println("cq: [" + s.getBytes().length + "]");
>>
>>     Text cq = new Text(s);
>>     System.out.println("cq: [" + cq + "]");
>>     System.out.println("cq: [" + cq.getLength() + "]");
>>     System.out.println("cq: [" + cq.getBytes().length + "]");
>>   }
>>
>> }
>>

Re: Bug In Text Class - getLength() and getBytes().length are different.

Posted by John Vines <vi...@apache.org>.
This is not a bug. A Text object is a reusable object which prevents
repeated creation of byte arrays, so it will use the same byte array,
resizing it if necessary, and writing over the previous values. Doing any
operation based on Text.getBytes().length has a strong potential to provide
inaccurate results. Text.getLength() is the appropriate way to get the
length of the underlying byte array that you care about.

John


On Tue, Nov 13, 2012 at 1:00 PM, David Medinets <da...@gmail.com>wrote:

> The following code (the TextTest class) displays:
>
> cq: [5000000000000000]
> cq: [16]
> cq: [16]
> cq: [5000000000000000]
> cq: [16]
> cq: [17]
>
> You'll notice that the last two numbers are different, but they should
> both be 16. This bug affects Accumulo because of the following code in
> Mutation:
>
>   private void put(byte b[]) {
>     buffer.writeVLong(b.length);
>     buffer.add(b, 0, b.length);
>   }
>
>   private void put(Text t) {
>     buffer.writeVLong(t.getLength());
>     buffer.add(t.getBytes(), 0, t.getLength());
>   }
>
> I should be able to call either of the following to get the same
> result but I can't.
>
>   put("5000000000000000".getBytes());
>   put(new Text("5000000000000000"));
>
> Has anyone else run into this issue? Any workarounds or fixes?
>
> ----
>
> package com.codebits.accumulo;
>
> import org.apache.hadoop.io.Text;
>
> public class TextTest {
>
>   public static void main(String[] args) {
>     String s = "5000000000000000";
>     System.out.println("cq: [" + s + "]");
>     System.out.println("cq: [" + s.length() + "]");
>     System.out.println("cq: [" + s.getBytes().length + "]");
>
>     Text cq = new Text(s);
>     System.out.println("cq: [" + cq + "]");
>     System.out.println("cq: [" + cq.getLength() + "]");
>     System.out.println("cq: [" + cq.getBytes().length + "]");
>   }
>
> }
>