You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Phabricator (JIRA)" <ji...@apache.org> on 2013/06/14 06:34:21 UTC
[jira] [Updated] (HIVE-4730) Join on more than 2^31 records on
single reducer failed (wrong results)
[ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-4730:
------------------------------
Attachment: HIVE-4730.D11283.1.patch
navis requested code review of "HIVE-4730 [jira] Join on more than 2^31 records on single reducer failed (wrong results)".
Reviewers: JIRA
HIVE-4730 Join on more than 2^31 records on single reducer failed (wrong results)
join on more than 2^31 rows leads to wrong results. for example:
Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n';
Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n';
Loading 1 row to small_table (the value 1).
Loading 2149580800 rows to big_table with the same value (1 on this case).
create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1);
select count from output ; will return only 1 row...
the reducer syslog:
...
2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used memory = 32925960
2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used memory = 12815184
2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used memory = 26684552 <-- looks like wrong value..
...
2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows
2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D11283
AFFECTED FILES
ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractRowContainer.java
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
MANAGE HERALD RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/26817/
To: JIRA, navis
> Join on more than 2^31 records on single reducer failed (wrong results)
> -----------------------------------------------------------------------
>
> Key: HIVE-4730
> URL: https://issues.apache.org/jira/browse/HIVE-4730
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
> Reporter: Gabi Kazav
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used memory = 26684552 <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira