You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2012/06/30 02:35:42 UTC
[jira] [Created] (HIVE-3218) When big table has two or more
partitions on SMBJoin it fails at runtime
Navis created HIVE-3218:
---------------------------
Summary: When big table has two or more partitions on SMBJoin it fails at runtime
Key: HIVE-3218
URL: https://issues.apache.org/jira/browse/HIVE-3218
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Minor
{noformat}
drop table hive_test_smb_bucket1;
drop table hive_test_smb_bucket2;
create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
set hive.enforce.bucketing = true;
set hive.enforce.sorting = true;
insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
{noformat}
which make bucket join context..
{noformat}
Alias Bucket Output File Name Mapping:
hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
{noformat}
fails with exception
{noformat}
java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
... 8 more
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416053#comment-13416053 ]
Namit Jain commented on HIVE-3218:
----------------------------------
I was thinking about the approaches 1,2,3.
2 seems better, since 3 would mean 1 mapper would be processing multiple files.
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416071#comment-13416071 ]
Navis commented on HIVE-3218:
-----------------------------
For quries handling many partitions with many buckets, it would be possibly needed to use option 3 parallelly. I'm thinking it for another issue.
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Attachment: (was: HIVE-3218.1.patch.txt)
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) When big table has two or more
partitions on SMBJoin it fails at runtime
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Status: Patch Available (was: Open)
https://reviews.facebook.net/D3933
Passed all ql/clientpositive tests. Complete test result will be updated next week.
> When big table has two or more partitions on SMBJoin it fails at runtime
> ------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423002#comment-13423002 ]
Namit Jain commented on HIVE-3218:
----------------------------------
I am missing something:
srcbucket20.txt=[srcbucket20.txt]
srcbucket21.txt=[srcbucket21.txt]
srcbucket22.txt=[srcbucket20.txt]
srcbucket23.txt=[srcbucket21.txt]
ds=2008-04-09/srcbucket20.txt=[srcbucket20.txt]
ds=2008-04-09/srcbucket21.txt=[srcbucket21.txt]
ds=2008-04-09/srcbucket22.txt=[srcbucket20.txt]
ds=2008-04-09/srcbucket23.txt=[srcbucket21.txt]
The mapping is:
small table alias -> big file table name -> list of small table file names
Shouldn't the big table file name and the small table file names be fully qualified ?
In the above example, bigtable file name is srcbucket20.txt and ds=2008-04-09/srcbucket20.txt. Why is it sometimes qualified by partition name and sometimes not ?
Similarly, shouldn't the small table file name be fully qualified ?
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424860#comment-13424860 ]
Hudson commented on HIVE-3218:
------------------------------
Integrated in Hive-trunk-h0.21 #1575 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1575/])
HIVE-3218 Stream table of SMBJoin/BucketMapJoin with two or more
partitions is not handled properly (Navis via namit) (Revision 1367012)
Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367012
Files :
* /hive/trunk/data/files/srcsortbucket1outof4.txt
* /hive/trunk/data/files/srcsortbucket2outof4.txt
* /hive/trunk/data/files/srcsortbucket3outof4.txt
* /hive/trunk/data/files/srcsortbucket4outof4.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BucketMatcher.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapperContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
* /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_1.q
* /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_2.q
* /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_3.q
* /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_4.q
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/stats11.q.out
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt, hive.3218.2.patch
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423676#comment-13423676 ]
Navis commented on HIVE-3218:
-----------------------------
addressed comments
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Attachment: HIVE-3218.1.patch.txt
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt, HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-3218:
-----------------------------
Attachment: hive.3218.2.patch
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt, hive.3218.2.patch
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423260#comment-13423260 ]
Namit Jain commented on HIVE-3218:
----------------------------------
comments on phabricator
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-3218:
-----------------------------
Status: Patch Available (was: Open)
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt, hive.3218.2.patch
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424698#comment-13424698 ]
Namit Jain commented on HIVE-3218:
----------------------------------
+1
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424692#comment-13424692 ]
Navis commented on HIVE-3218:
-----------------------------
@Namin Jain, updated test file just now.
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Status: Patch Available (was: Open)
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt, HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-3218:
-----------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
Committed. Thanks Navis
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt, hive.3218.2.patch
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) When big table has two or more
partitions on SMBJoin it fails at runtime
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Attachment: HIVE-3218.1.patch.txt
> When big table has two or more partitions on SMBJoin it fails at runtime
> ------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Status: Patch Available (was: Open)
Rebased to trunk and added more comments
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Priority: Critical (was: Minor)
Summary: Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly (was: When big table has two or more partitions on SMBJoin it fails at runtime)
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424691#comment-13424691 ]
Namit Jain commented on HIVE-3218:
----------------------------------
@Navis, can you update the test file ?
Let us try to get this in.
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404919#comment-13404919 ]
Navis commented on HIVE-3218:
-----------------------------
For BucketedMapJoin it just returns invalid result, which is worse than SMBJoin case. (bucketedmapjoin5.q test is broken)
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-3218:
-----------------------------
Status: Open (was: Patch Available)
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) When big table has two or more
partitions on SMBJoin it fails at runtime
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404343#comment-13404343 ]
Navis commented on HIVE-3218:
-----------------------------
It's caused by having duplicated taskID for each partition of stream table. Three option is possible.
1. Do not allow this
2. Augment task-ID with partition spec
3. Combine files with same bucket ID in BucketizedHiveInputFormat
I've implemented option-2 and on testing. But option-3 seemed to be more safe (and easier).
I think this is possibly happen with bucket mapjoin.
> When big table has two or more partitions on SMBJoin it fails at runtime
> ------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422993#comment-13422993 ]
Namit Jain commented on HIVE-3218:
----------------------------------
A M data/files/srcsbucket20.txt (118 lines) - -
A M data/files/srcsbucket21.txt (120 lines) - -
A M data/files/srcsbucket22.txt (124 lines) - -
A M data/files/srcsbucket23.txt (138 lines)
Is it sorted and bucketed ?
If yes, can you change the names of these files to
srcsortbucket1outof4.txt
srcsortbucket2outof4.txt ..
These files might be used for a lot of tests, so it might be a good idea to be clear about the names of these files.
Sorry about being picky on the name of these files.
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3218) Stream table of
SMBJoin/BucketMapJoin with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423696#comment-13423696 ]
Namit Jain commented on HIVE-3218:
----------------------------------
minor comments on phabricator
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) Stream table of SMBJoin/BucketMapJoin
with two or more partitions is not handled properly
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-3218:
-----------------------------
Status: Open (was: Patch Available)
comments
> Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Critical
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3218) When big table has two or more
partitions on SMBJoin it fails at runtime
Posted by "Navis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-3218:
------------------------
Status: Open (was: Patch Available)
This happens with bucket mapjoin, too. It would be better to fix it along with smb join.
> When big table has two or more partitions on SMBJoin it fails at runtime
> ------------------------------------------------------------------------
>
> Key: HIVE-3218
> URL: https://issues.apache.org/jira/browse/HIVE-3218
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Attachments: HIVE-3218.1.patch.txt
>
>
> {noformat}
> drop table hive_test_smb_bucket1;
> drop table hive_test_smb_bucket2;
> create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
> insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
> insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
> {noformat}
> which make bucket join context..
> {noformat}
> Alias Bucket Output File Name Mapping:
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
> hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
> {noformat}
> fails with exception
> {noformat}
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
> ... 8 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira