You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Chase Bradford <ch...@gmail.com> on 2010/06/17 06:19:39 UTC

ImmutableBytesWritable comparator not lexicographic?

Hi Everyone,

I've been trying to track down a problem I'm having with sorting IBWs with
it comparator, and it seems as though the comparator doesn't work as
expected.

The problem seems to be that IBW.Comparator extends WritableComparator, but
only overrides compareBytes.  WritableComparator.compare uses IBW.compareTo
which compares by length, then contents, as if aiming for a big-endian
numerical comparison.  Although, it's not quite a numerical comparison,
because it doesn't account for leading 0 bytes.

I stumbled on this while trying to use the TotalOrderPartitioner with a
partition file lexicographically sorted but with values of varying lengths.
It uses the Comparator's compare() method.

Can someone explain why IBW.compareTo is implemented this way?

Thanks,
Chase

My test case:

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
public class Test
{
   public static void main(String[] args){
      ImmutableBytesWritable ibw1 = new ImmutableBytesWritable( new
byte[]{0x0f} );
      ImmutableBytesWritable ibw2 = new ImmutableBytesWritable( new
byte[]{0x00, 0x00} );
      ImmutableBytesWritable.Comparator c = new
ImmutableBytesWritable.Comparator();

      if( c.compare( ibw1, ibw2 ) < 0 )
         System.err.println( "ibw1 < ibw2" );

      System.exit(0);
   }
}

Re: ImmutableBytesWritable comparator not lexicographic?

Posted by Ted Yu <yu...@gmail.com>.

Check the latest fix http://issues.apache.org/jira/browse/HBASE-2635

On Wed, Jun 16, 2010 at 9:37 PM, Ted Yu <yu...@gmail.com> wrote:

> What version of HBase are you using ?
>
>
> On Wed, Jun 16, 2010 at 9:19 PM, Chase Bradford <ch...@gmail.com>wrote:
>
>> Hi Everyone,
>>
>> I've been trying to track down a problem I'm having with sorting IBWs with
>> it comparator, and it seems as though the comparator doesn't work as
>> expected.
>>
>> The problem seems to be that IBW.Comparator extends WritableComparator,
>> but
>> only overrides compareBytes.  WritableComparator.compare uses
>> IBW.compareTo
>> which compares by length, then contents, as if aiming for a big-endian
>> numerical comparison.  Although, it's not quite a numerical comparison,
>> because it doesn't account for leading 0 bytes.
>>
>> I stumbled on this while trying to use the TotalOrderPartitioner with a
>> partition file lexicographically sorted but with values of varying
>> lengths.
>> It uses the Comparator's compare() method.
>>
>> Can someone explain why IBW.compareTo is implemented this way?
>>
>> Thanks,
>> Chase
>>
>> My test case:
>>
>> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
>> public class Test
>> {
>>   public static void main(String[] args){
>>      ImmutableBytesWritable ibw1 = new ImmutableBytesWritable( new
>> byte[]{0x0f} );
>>      ImmutableBytesWritable ibw2 = new ImmutableBytesWritable( new
>> byte[]{0x00, 0x00} );
>>      ImmutableBytesWritable.Comparator c = new
>> ImmutableBytesWritable.Comparator();
>>
>>      if( c.compare( ibw1, ibw2 ) < 0 )
>>         System.err.println( "ibw1 < ibw2" );
>>
>>      System.exit(0);
>>   }
>> }
>>
>
>

Re: ImmutableBytesWritable comparator not lexicographic?

Posted by Ted Yu <yu...@gmail.com>.

What version of HBase are you using ?

On Wed, Jun 16, 2010 at 9:19 PM, Chase Bradford <ch...@gmail.com>wrote:

> Hi Everyone,
>
> I've been trying to track down a problem I'm having with sorting IBWs with
> it comparator, and it seems as though the comparator doesn't work as
> expected.
>
> The problem seems to be that IBW.Comparator extends WritableComparator, but
> only overrides compareBytes.  WritableComparator.compare uses IBW.compareTo
> which compares by length, then contents, as if aiming for a big-endian
> numerical comparison.  Although, it's not quite a numerical comparison,
> because it doesn't account for leading 0 bytes.
>
> I stumbled on this while trying to use the TotalOrderPartitioner with a
> partition file lexicographically sorted but with values of varying lengths.
> It uses the Comparator's compare() method.
>
> Can someone explain why IBW.compareTo is implemented this way?
>
> Thanks,
> Chase
>
> My test case:
>
> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
> public class Test
> {
>   public static void main(String[] args){
>      ImmutableBytesWritable ibw1 = new ImmutableBytesWritable( new
> byte[]{0x0f} );
>      ImmutableBytesWritable ibw2 = new ImmutableBytesWritable( new
> byte[]{0x00, 0x00} );
>      ImmutableBytesWritable.Comparator c = new
> ImmutableBytesWritable.Comparator();
>
>      if( c.compare( ibw1, ibw2 ) < 0 )
>         System.err.println( "ibw1 < ibw2" );
>
>      System.exit(0);
>   }
> }
>

Re: ImmutableBytesWritable comparator not lexicographic?

Posted by Chase Bradford <ch...@gmail.com>.

Yep, that ticket looks like it covers it.

Thank you very much.

On Wed, Jun 16, 2010 at 9:55 PM, Stack <st...@duboce.net> wrote:
> On Wed, Jun 16, 2010 at 9:19 PM, Chase Bradford
> <ch...@gmail.com> wrote:
>> I've been trying to track down a problem I'm having with sorting IBWs with
>> it comparator, and it seems as though the comparator doesn't work as
>> expected.
>>
>
> Looks like https://issues.apache.org/jira/browse/HBASE-2378?  (You
> might also be interested in: 'HBASE-2635  ImmutableBytesWritable
> ignores offset in several cases' by chance?  (Given where you are
> currently digging)).
>
> I added your test code below as a unit test to the 0.20.5 release
> candidate and it passed (See below).
>
> Hopefully this is the issue you are running into.  Ping if not.  Sorry
> for inconvenience.
> St.Ack
>
>
> diff --git a/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
> b/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritab
> index 43fa6dd..77c4506 100644
> --- a/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
> +++ b/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
> @@ -40,6 +40,13 @@ public class TestImmutableBytesWritable extends TestCase {
>       new ImmutableBytesWritable(Bytes.toBytes("xxabc"), 2, 2).hashCode());
>   }
>
> +  public void testSpecificCompare() {
> +    ImmutableBytesWritable ibw1 = new ImmutableBytesWritable(new byte[]{0x0f});
> +    ImmutableBytesWritable ibw2 = new ImmutableBytesWritable(new
> byte[]{0x00, 0x00});
> +    ImmutableBytesWritable.Comparator c = new
> ImmutableBytesWritable.Comparator();
> +    assertFalse("ibw1 < ibw2", c.compare( ibw1, ibw2 ) < 0 );
> +  }
> +
>   public void testComparison() throws Exception {
>     runTests("aa", "b", -1);
>     runTests("aa", "aa", 0);
>
>
>
>
>> The problem seems to be that IBW.Comparator extends WritableComparator, but
>> only overrides compareBytes.  WritableComparator.compare uses IBW.compareTo
>> which compares by length, then contents, as if aiming for a big-endian
>> numerical comparison.  Although, it's not quite a numerical comparison,
>> because it doesn't account for leading 0 bytes.
>>
>> I stumbled on this while trying to use the TotalOrderPartitioner with a
>> partition file lexicographically sorted but with values of varying lengths.
>> It uses the Comparator's compare() method.
>>
>> Can someone explain why IBW.compareTo is implemented this way?
>>
>> Thanks,
>> Chase
>>
>> My test case:
>>
>> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
>> public class Test
>> {
>>   public static void main(String[] args){
>>      ImmutableBytesWritable ibw1 = new ImmutableBytesWritable( new
>> byte[]{0x0f} );
>>      ImmutableBytesWritable ibw2 = new ImmutableBytesWritable( new
>> byte[]{0x00, 0x00} );
>>      ImmutableBytesWritable.Comparator c = new
>> ImmutableBytesWritable.Comparator();
>>
>>      if( c.compare( ibw1, ibw2 ) < 0 )
>>         System.err.println( "ibw1 < ibw2" );
>>
>>      System.exit(0);
>>   }
>> }
>>
>



-- 
Chase Bradford


“If in physics there's something you don't understand, you can always
hide behind the uncharted depths of nature. But if your program
doesn't work, there is no obstinate nature. If it doesn't work, you've
messed up.”

- Edsger Dijkstra

Re: ImmutableBytesWritable comparator not lexicographic?

Posted by Stack <st...@duboce.net>.

On Wed, Jun 16, 2010 at 9:19 PM, Chase Bradford
<ch...@gmail.com> wrote:
> I've been trying to track down a problem I'm having with sorting IBWs with
> it comparator, and it seems as though the comparator doesn't work as
> expected.
>

Looks like https://issues.apache.org/jira/browse/HBASE-2378?  (You
might also be interested in: 'HBASE-2635  ImmutableBytesWritable
ignores offset in several cases' by chance?  (Given where you are
currently digging)).

I added your test code below as a unit test to the 0.20.5 release
candidate and it passed (See below).

Hopefully this is the issue you are running into.  Ping if not.  Sorry
for inconvenience.
St.Ack


diff --git a/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
b/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritab
index 43fa6dd..77c4506 100644
--- a/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
+++ b/src/test/java/org/apache/hadoop/hbase/io/TestImmutableBytesWritable.java
@@ -40,6 +40,13 @@ public class TestImmutableBytesWritable extends TestCase {
       new ImmutableBytesWritable(Bytes.toBytes("xxabc"), 2, 2).hashCode());
   }

+  public void testSpecificCompare() {
+    ImmutableBytesWritable ibw1 = new ImmutableBytesWritable(new byte[]{0x0f});
+    ImmutableBytesWritable ibw2 = new ImmutableBytesWritable(new
byte[]{0x00, 0x00});
+    ImmutableBytesWritable.Comparator c = new
ImmutableBytesWritable.Comparator();
+    assertFalse("ibw1 < ibw2", c.compare( ibw1, ibw2 ) < 0 );
+  }
+
   public void testComparison() throws Exception {
     runTests("aa", "b", -1);
     runTests("aa", "aa", 0);




> The problem seems to be that IBW.Comparator extends WritableComparator, but
> only overrides compareBytes.  WritableComparator.compare uses IBW.compareTo
> which compares by length, then contents, as if aiming for a big-endian
> numerical comparison.  Although, it's not quite a numerical comparison,
> because it doesn't account for leading 0 bytes.
>
> I stumbled on this while trying to use the TotalOrderPartitioner with a
> partition file lexicographically sorted but with values of varying lengths.
> It uses the Comparator's compare() method.
>
> Can someone explain why IBW.compareTo is implemented this way?
>
> Thanks,
> Chase
>
> My test case:
>
> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
> public class Test
> {
>   public static void main(String[] args){
>      ImmutableBytesWritable ibw1 = new ImmutableBytesWritable( new
> byte[]{0x0f} );
>      ImmutableBytesWritable ibw2 = new ImmutableBytesWritable( new
> byte[]{0x00, 0x00} );
>      ImmutableBytesWritable.Comparator c = new
> ImmutableBytesWritable.Comparator();
>
>      if( c.compare( ibw1, ibw2 ) < 0 )
>         System.err.println( "ibw1 < ibw2" );
>
>      System.exit(0);
>   }
> }
>