You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (Created) (JIRA)" <ji...@apache.org> on 2012/03/25 06:45:42 UTC
[jira] [Created] (HIVE-2903) Numeric binary type keys are not
compared properly
Numeric binary type keys are not compared properly
--------------------------------------------------
Key: HIVE-2903
URL: https://issues.apache.org/jira/browse/HIVE-2903
Project: Hive
Issue Type: Bug
Components: HBase Handler
Reporter: Navis
Assignee: Navis
In current binary format for numbers, minus values are always greater than plus values, for example.
{code}
System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Navis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239060#comment-13239060 ]
Navis commented on HIVE-2903:
-----------------------------
@Ashutosh Chauhan,
You are right. This is just a hive specific binary format and not a general solution. So other non-hive hbase client should know about that apriori for using it. (Should this be a patch for hbase?)
But without this, rows with minus keys are popped-up especially after applying HIVE-2897. I don't know how this should be handled.
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239041#comment-13239041 ]
Ashutosh Chauhan commented on HIVE-2903:
----------------------------------------
@Navis,
The way you have fixed it, it will work only if data is written from hive into hbase and then queries are run from hive client against hbase. What if data was written in hbase through hbase client and then queried from hive client, this bug will still be there, isn't it?
This also makes me wonder that this problem is not limited to hive, but for hbase in general. If you are writing data through hbase client and then do range scans, you will have same bug. There must be some solution in hbase space for this.
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-2903:
------------------------------
Attachment: HIVE-2903.D2481.2.patch
navis updated the revision "HIVE-2903 [jira] Numeric binary type keys are not compared properly".
Reviewers: JIRA
1. Separated storage option 'lbinary' for handling minus values.
2. Added COLUMNS, COLUMN_TYPES to HBase table job property
REVISION DETAIL
https://reviews.facebook.net/D2481
AFFECTED FILES
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
hbase-handler/src/test/queries/external_table_ppd.q
hbase-handler/src/test/results/external_table_ppd.q.out
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch, HIVE-2903.D2481.2.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-2903:
------------------------------
Attachment: HIVE-2903.D2481.1.patch
navis requested code review of "HIVE-2903 [jira] Numeric binary type keys are not compared properly".
Reviewers: JIRA
DPAL-1007 Numeric binary type keys are not compared properly
In current binary format for numbers, minus values are always greater than plus values, for example.
System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D2481
AFFECTED FILES
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
hbase-handler/src/test/queries/external_table_ppd.q
hbase-handler/src/test/results/external_table_ppd.q.out
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/5565/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Navis (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240800#comment-13240800 ]
Navis commented on HIVE-2903:
-----------------------------
It would be better to be handled in hbase rather then hive, IMHO. Before that, I've separated the code by option 'lbinary' ('signedbinary' conflicts with 'string' which starts with 's').
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch, HIVE-2903.D2481.2.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Enis Soztutar (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239088#comment-13239088 ]
Enis Soztutar commented on HIVE-2903:
-------------------------------------
Well, it is not a "bug" of hbase. HBase only provides int -> byte[] conversion as a convenience, and it seems that Bytes.toBytes(int) and others only guarantees lexicographic ordering for unsigned numbers. We can definitely add something like Bytes.toSignedBytes() in HBase so that you can ensure signed numbers are sorted correctly in lexicographic order.
Coming to Hive, I think Ashutosh is right, that we have to keep supporting already existing data in hbase serialized through Bytes.toBytes(). So, I would suggest we add another storage type (hbase.table.default.storage.type), like "signedbinary", which should do the hive-specific signed byte conversion.
So, we would have:
- cf:col#string : serialize as string
- cf:col#binary : serialize as binary, compatible with Bytes.toBytes()
- cf:col#signedBinary : serialize as signed binary.
I would also suggest that, people might be interested in custom ser/de from Hive types to byte[], but I am not sure how feasible that would be to implement.
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-2903:
------------------------
Status: Patch Available (was: Open)
Passed all tests.
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2903) Numeric binary type keys are not
compared properly
Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239075#comment-13239075 ]
Ashutosh Chauhan commented on HIVE-2903:
----------------------------------------
Yeah, I am having hard time believing that hbase lets you do this. I am not sure if the bug is present in some form in hbase. Your experiment does suggest its there in hbase. If it is, then it certainly makes sense to patch hbase, instead of special handling it ourselves.
> Numeric binary type keys are not compared properly
> --------------------------------------------------
>
> Key: HIVE-2903
> URL: https://issues.apache.org/jira/browse/HIVE-2903
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: Navis
> Assignee: Navis
> Attachments: HIVE-2903.D2481.1.patch
>
>
> In current binary format for numbers, minus values are always greater than plus values, for example.
> {code}
> System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira