You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (Created) (JIRA)" <ji...@apache.org> on 2012/03/25 06:45:42 UTC

[jira] [Created] (HIVE-2903) Numeric binary type keys are not compared properly

Numeric binary type keys are not compared properly
--------------------------------------------------

                 Key: HIVE-2903
                 URL: https://issues.apache.org/jira/browse/HIVE-2903
             Project: Hive
          Issue Type: Bug
          Components: HBase Handler
            Reporter: Navis
            Assignee: Navis


In current binary format for numbers, minus values are always greater than plus values, for example.
{code}
System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Navis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239060#comment-13239060 ] 

Navis commented on HIVE-2903:
-----------------------------

@Ashutosh Chauhan,
You are right. This is just a hive specific binary format and not a general solution. So other non-hive hbase client should know about that apriori for using it. (Should this be a patch for hbase?)
But without this, rows with minus keys are popped-up especially after applying HIVE-2897. I don't know how this should be handled.
                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239041#comment-13239041 ] 

Ashutosh Chauhan commented on HIVE-2903:
----------------------------------------

@Navis,
The way you have fixed it, it will work only if data is written from hive into hbase and then queries are run from hive client against hbase. What if data was written in hbase through hbase client and then queried from hive client, this bug will still be there, isn't it?
This also makes me wonder that this problem is not limited to hive, but for hbase in general. If you are writing data through hbase client and then do range scans, you will have same bug. There must be some solution in hbase space for this.
                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2903:
------------------------------

    Attachment: HIVE-2903.D2481.2.patch

navis updated the revision "HIVE-2903 [jira] Numeric binary type keys are not compared properly".
Reviewers: JIRA

  1. Separated storage option 'lbinary' for handling minus values.
  2. Added COLUMNS, COLUMN_TYPES to HBase table job property

REVISION DETAIL
  https://reviews.facebook.net/D2481

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
  hbase-handler/src/test/queries/external_table_ppd.q
  hbase-handler/src/test/results/external_table_ppd.q.out

                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch, HIVE-2903.D2481.2.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2903:
------------------------------

    Attachment: HIVE-2903.D2481.1.patch

navis requested code review of "HIVE-2903 [jira] Numeric binary type keys are not compared properly".
Reviewers: JIRA

  DPAL-1007 Numeric binary type keys are not compared properly

  In current binary format for numbers, minus values are always greater than plus values, for example.

  System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D2481

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
  hbase-handler/src/test/queries/external_table_ppd.q
  hbase-handler/src/test/results/external_table_ppd.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/5565/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Navis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240800#comment-13240800 ] 

Navis commented on HIVE-2903:
-----------------------------

It would be better to be handled in hbase rather then hive, IMHO. Before that, I've separated the code by option 'lbinary' ('signedbinary' conflicts with 'string' which starts with 's'). 
                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch, HIVE-2903.D2481.2.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Enis Soztutar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239088#comment-13239088 ] 

Enis Soztutar commented on HIVE-2903:
-------------------------------------

Well, it is not a "bug" of hbase. HBase only provides int -> byte[] conversion as a convenience, and it seems that Bytes.toBytes(int) and others only guarantees lexicographic ordering for unsigned numbers. We can definitely add something like Bytes.toSignedBytes() in HBase so that you can ensure signed numbers are sorted correctly in lexicographic order.

Coming to Hive, I think Ashutosh is right, that we have to keep supporting already existing data in hbase serialized through Bytes.toBytes(). So, I would suggest we add another storage type (hbase.table.default.storage.type), like "signedbinary", which should do the hive-specific signed byte conversion. 

So, we would have: 
 - cf:col#string       : serialize as string
 - cf:col#binary       : serialize as binary, compatible with Bytes.toBytes() 
 - cf:col#signedBinary : serialize as signed binary. 

I would also suggest that, people might be interested in custom ser/de from Hive types to byte[], but I am not sure how feasible that would be to implement. 
                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2903:
------------------------

    Status: Patch Available  (was: Open)

Passed all tests.
                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2903) Numeric binary type keys are not compared properly

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239075#comment-13239075 ] 

Ashutosh Chauhan commented on HIVE-2903:
----------------------------------------

Yeah, I am having hard time believing that hbase lets you do this. I am not sure if the bug is present in some form in hbase. Your experiment does suggest its there in hbase. If it is, then it certainly makes sense to patch hbase, instead of special handling it ourselves. 
                
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
>                 Key: HIVE-2903
>                 URL: https://issues.apache.org/jira/browse/HIVE-2903
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: Navis
>            Assignee: Navis
>         Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira