You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Suma Shivaprasad <su...@gmail.com> on 2014/07/30 15:13:37 UTC
Exception in Hive with SMB join and Parquet
Am using 0.13.0 version of hive with parquet table having 34 columns
with the following props while creating the table
*CLUSTERED BY (udid) SORTED BY (udid ASC) INTO 256 BUCKETS
STORED as PARQUET
TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
The query I am running is
*set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.mapjoin.smalltable.filesize=200000000;
set hive.vectorized.execution.enabled = true;*
*set hive.stats.fetch.column.stats=true;
set hive.stats.collect.tablekeys=true;
set hive.stats.reliable=true;*
*select sum(rev),sum(adimp)
from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
where dt = '..' and hour='..'
and dm.age.source = '..'
and dm.age.id <http://dm.age.id> IN ('..')
group by rr.udid;*
with both user_rr_parq and user_domain_parq both clustered and sorted
by same join key
*Exception in Mapper logs*
2014-07-30 12:44:08,577 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
{"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row
{"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more*Caused by:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more*Caused by: java.io.IOException:
java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
... 18 more
Looks like it is trying to access the column with index as 29 where as
there are only 5 non null columns being present in the row - which
matches the Arraylist size.
What could be going wrong here?
Thanks
Suma
Re: Exception in Hive with SMB join and Parquet
Posted by Suma Shivaprasad <su...@gmail.com>.
Have submitted a HIVE patch for this issue -
https://issues.apache.org/jira/browse/HIVE-7629.
Can someone pls review this?
On Wed, Jul 30, 2014 at 7:05 PM, Suma Shivaprasad <
sumasai.shivaprasad@gmail.com> wrote:
> Retried with hive.optimize.sort.dynamic.partition=false. Still seeing the
> same issue.
>
> Thanks
> Suma
>
>
> On Wed, Jul 30, 2014 at 6:55 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> what's the value of the variable hive.optimize.sort.dynamic.partition
>>
>> can you try disabling it if it on?
>>
>>
>> On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
>> sumasai.shivaprasad@gmail.com> wrote:
>>
>>> Am using 0.13.0 version of hive with parquet table having 34 columns with the following props while creating the table
>>>
>>>
>>> *CLUSTERED BY (udid) SORTED BY (udid ASC) INTO 256 BUCKETS
>>> STORED as PARQUET
>>> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>>>
>>> The query I am running is
>>>
>>>
>>> *set hive.optimize.bucketmapjoin = true;
>>> set hive.optimize.bucketmapjoin.sortedmerge = true;
>>> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>>> set hive.mapjoin.smalltable.filesize=200000000;
>>> set hive.vectorized.execution.enabled = true;*
>>>
>>>
>>> *set hive.stats.fetch.column.stats=true;
>>> set hive.stats.collect.tablekeys=true;
>>> set hive.stats.reliable=true;*
>>>
>>> *select sum(rev),sum(adimp)
>>> from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
>>> where dt = '..' and hour='..'
>>> and dm.age.source = '..'
>>> and dm.age.id <http://dm.age.id> IN ('..')
>>> group by rr.udid;*
>>>
>>>
>>> with both user_rr_parq and user_domain_parq both clustered and sorted by same join key
>>>
>>> *Exception in Mapper logs*
>>>
>>> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>
>>>
>>>
>>>
>>> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
>>> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>>> ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
>>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
>>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
>>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
>>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>>> ... 9 more*Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
>>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
>>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
>>> ... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>>> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>> at java.util.ArrayList.get(ArrayList.java:322)
>>> at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
>>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
>>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
>>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
>>> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
>>> ... 18 more
>>>
>>> Looks like it is trying to access the column with index as 29 where as there are only 5 non null columns being present in the row - which matches the Arraylist size.
>>>
>>> What could be going wrong here?
>>>
>>>
>>> Thanks
>>>
>>> Suma
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
Re: Exception in Hive with SMB join and Parquet
Posted by Suma Shivaprasad <su...@gmail.com>.
Retried with hive.optimize.sort.dynamic.partition=false. Still seeing the
same issue.
Thanks
Suma
On Wed, Jul 30, 2014 at 6:55 PM, Nitin Pawar <ni...@gmail.com>
wrote:
> what's the value of the variable hive.optimize.sort.dynamic.partition
>
> can you try disabling it if it on?
>
>
> On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
> sumasai.shivaprasad@gmail.com> wrote:
>
>> Am using 0.13.0 version of hive with parquet table having 34 columns with the following props while creating the table
>>
>>
>> *CLUSTERED BY (udid) SORTED BY (udid ASC) INTO 256 BUCKETS
>> STORED as PARQUET
>> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>>
>> The query I am running is
>>
>>
>> *set hive.optimize.bucketmapjoin = true;
>> set hive.optimize.bucketmapjoin.sortedmerge = true;
>> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>> set hive.mapjoin.smalltable.filesize=200000000;
>> set hive.vectorized.execution.enabled = true;*
>>
>>
>> *set hive.stats.fetch.column.stats=true;
>> set hive.stats.collect.tablekeys=true;
>> set hive.stats.reliable=true;*
>>
>> *select sum(rev),sum(adimp)
>> from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
>> where dt = '..' and hour='..'
>> and dm.age.source = '..'
>> and dm.age.id <http://dm.age.id> IN ('..')
>> group by rr.udid;*
>>
>>
>> with both user_rr_parq and user_domain_parq both clustered and sorted by same join key
>>
>> *Exception in Mapper logs*
>>
>> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>
>>
>>
>> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
>> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>> ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>> ... 9 more*Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
>> ... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>> at java.util.ArrayList.get(ArrayList.java:322)
>> at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
>> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
>> ... 18 more
>>
>> Looks like it is trying to access the column with index as 29 where as there are only 5 non null columns being present in the row - which matches the Arraylist size.
>>
>> What could be going wrong here?
>>
>>
>> Thanks
>>
>> Suma
>>
>>
>>
>>
>
>
> --
> Nitin Pawar
>
Re: Exception in Hive with SMB join and Parquet
Posted by Suma Shivaprasad <su...@gmail.com>.
Retried with hive.optimize.sort.dynamic.partition=false. Still seeing the
same issue.
Thanks
Suma
On Wed, Jul 30, 2014 at 6:55 PM, Nitin Pawar <ni...@gmail.com>
wrote:
> what's the value of the variable hive.optimize.sort.dynamic.partition
>
> can you try disabling it if it on?
>
>
> On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
> sumasai.shivaprasad@gmail.com> wrote:
>
>> Am using 0.13.0 version of hive with parquet table having 34 columns with the following props while creating the table
>>
>>
>> *CLUSTERED BY (udid) SORTED BY (udid ASC) INTO 256 BUCKETS
>> STORED as PARQUET
>> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>>
>> The query I am running is
>>
>>
>> *set hive.optimize.bucketmapjoin = true;
>> set hive.optimize.bucketmapjoin.sortedmerge = true;
>> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
>> set hive.mapjoin.smalltable.filesize=200000000;
>> set hive.vectorized.execution.enabled = true;*
>>
>>
>> *set hive.stats.fetch.column.stats=true;
>> set hive.stats.collect.tablekeys=true;
>> set hive.stats.reliable=true;*
>>
>> *select sum(rev),sum(adimp)
>> from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
>> where dt = '..' and hour='..'
>> and dm.age.source = '..'
>> and dm.age.id <http://dm.age.id> IN ('..')
>> group by rr.udid;*
>>
>>
>> with both user_rr_parq and user_domain_parq both clustered and sorted by same join key
>>
>> *Exception in Mapper logs*
>>
>> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>
>>
>>
>> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
>> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>> ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>> ... 9 more*Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
>> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
>> ... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
>> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>> at java.util.ArrayList.get(ArrayList.java:322)
>> at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
>> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
>> ... 18 more
>>
>> Looks like it is trying to access the column with index as 29 where as there are only 5 non null columns being present in the row - which matches the Arraylist size.
>>
>> What could be going wrong here?
>>
>>
>> Thanks
>>
>> Suma
>>
>>
>>
>>
>
>
> --
> Nitin Pawar
>
Re: Exception in Hive with SMB join and Parquet
Posted by Nitin Pawar <ni...@gmail.com>.
what's the value of the variable hive.optimize.sort.dynamic.partition
can you try disabling it if it on?
On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
sumasai.shivaprasad@gmail.com> wrote:
> Am using 0.13.0 version of hive with parquet table having 34 columns with the following props while creating the table
>
>
> *CLUSTERED BY (udid) SORTED BY (udid ASC) INTO 256 BUCKETS
> STORED as PARQUET
> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>
> The query I am running is
>
>
> *set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.mapjoin.smalltable.filesize=200000000;
> set hive.vectorized.execution.enabled = true;*
>
>
> *set hive.stats.fetch.column.stats=true;
> set hive.stats.collect.tablekeys=true;
> set hive.stats.reliable=true;*
>
> *select sum(rev),sum(adimp)
> from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
> where dt = '..' and hour='..'
> and dm.age.source = '..'
> and dm.age.id <http://dm.age.id> IN ('..')
> group by rr.udid;*
>
>
> with both user_rr_parq and user_domain_parq both clustered and sorted by same join key
>
> *Exception in Mapper logs*
>
> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>
>
> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
> ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
> ... 9 more*Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
> ... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
> ... 18 more
>
> Looks like it is trying to access the column with index as 29 where as there are only 5 non null columns being present in the row - which matches the Arraylist size.
>
> What could be going wrong here?
>
>
> Thanks
>
> Suma
>
>
>
>
--
Nitin Pawar
Re: Exception in Hive with SMB join and Parquet
Posted by Nitin Pawar <ni...@gmail.com>.
what's the value of the variable hive.optimize.sort.dynamic.partition
can you try disabling it if it on?
On Wed, Jul 30, 2014 at 6:43 PM, Suma Shivaprasad <
sumasai.shivaprasad@gmail.com> wrote:
> Am using 0.13.0 version of hive with parquet table having 34 columns with the following props while creating the table
>
>
> *CLUSTERED BY (udid) SORTED BY (udid ASC) INTO 256 BUCKETS
> STORED as PARQUET
> TBLPROPERTIES ("parquet.compression"="SNAPPY"); *
>
> The query I am running is
>
>
> *set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.mapjoin.smalltable.filesize=200000000;
> set hive.vectorized.execution.enabled = true;*
>
>
> *set hive.stats.fetch.column.stats=true;
> set hive.stats.collect.tablekeys=true;
> set hive.stats.reliable=true;*
>
> *select sum(rev),sum(adimp)
> from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
> where dt = '..' and hour='..'
> and dm.age.source = '..'
> and dm.age.id <http://dm.age.id> IN ('..')
> group by rr.udid;*
>
>
> with both user_rr_parq and user_domain_parq both clustered and sorted by same join key
>
> *Exception in Mapper logs*
>
> 2014-07-30 12:44:08,577 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>
>
> 2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
> ... 8 more*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
> ... 9 more*Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
> at org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
> ... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
> ... 18 more
>
> Looks like it is trying to access the column with index as 29 where as there are only 5 non null columns being present in the row - which matches the Arraylist size.
>
> What could be going wrong here?
>
>
> Thanks
>
> Suma
>
>
>
>
--
Nitin Pawar