You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ankit Malhotra (JIRA)" <ji...@apache.org> on 2013/10/14 23:43:43 UTC
[jira] [Commented] (HIVE-4175) Injection of emptyFile into input splits for empty partitions causes Deserializer to fail

    [ https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794522#comment-13794522 ] 

Ankit Malhotra commented on HIVE-4175:
--------------------------------------

Ran into this while using Elephant Bird's protobuf deserializer. This only happens when I have partitions that dont have any files.

My table:
{code}
CREATE  TABLE test_proto_v002(
  timestamp bigint COMMENT 'from deserializer', 
  auction_id bigint COMMENT 'from deserializer', 
  object_type int COMMENT 'from deserializer', 
  object_id int COMMENT 'from deserializer', 
  method int COMMENT 'from deserializer', 
  value double COMMENT 'from deserializer', 
  event_type int COMMENT 'from deserializer')
PARTITIONED BY ( 
  dy string, 
  dm string, 
  dd string, 
  dh string)
ROW FORMAT DELIMITED 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
  'hdfs://localhost/logs/test_proto/v002'
TBLPROPERTIES (
  'transient_lastDdlTime'='1381346965')

Logs for hive job from JT:
{code}
....
2013-10-14 16:24:23,555 INFO org.apache.hadoop.mapred.MapTask: Processing split: Paths:/Users/amalhotra/hadoop/appdoop/tmp/hive/hive_2013-10-14_16-24-18_372_8603315140627624341/-mr-10002/1/emptyFile:0+87,/logs/test_proto/v002/2013/10/14/11/test_proto_1381765303443:0+541InputFormatClass: org.apache.hadoop.mapred.SequenceFileInputFormat
....
2013-10-14 16:24:24,079 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://localhost/logs/test_proto/v002/2013/10/14/11/test_proto_1381765303443
2013-10-14 16:24:24,079 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias test_proto_v002 for file hdfs://localhost/logs/test_proto/v002/2013/10/14/11
2013-10-14 16:24:24,080 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
	at com.twitter.elephantbird.hive.serde.ProtobufDeserializer.deserialize(ProtobufDeserializer.java:56)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:525)
	... 9 more

2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:1
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 0 finished. closing... 
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 0 forwarded 0 rows
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 1 Close done
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 2 Close done
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 Close done
2013-10-14 16:24:24,080 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 Close done
2013-10-14 16:24:24,080 INFO ExecMapper: ExecMapper: processed 0 rows: used memory = 120821440
2013-10-14 16:24:24,084 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-10-14 16:24:24,086 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
	... 8 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
	at com.twitter.elephantbird.hive.serde.ProtobufDeserializer.deserialize(ProtobufDeserializer.java:56)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:525)
	... 9 more
2013-10-14 16:24:24,089 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
...
{code}


> Injection of emptyFile into input splits for empty partitions causes Deserializer to fail
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-4175
>                 URL: https://issues.apache.org/jira/browse/HIVE-4175
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: CDH4.2, using MR1
>            Reporter: James Kebinger
>            Priority: Minor
>
> My deserializer is expecting to receive one of 2 different subclasses of Writable, but in certain circumstances it receives an empty instance of org.apache.hadoop.io.Text. This only happens for task attempts where I observe the file called "emptyFile" in the list of input splits. 
> I'm doing queries over an external year/month/day partitioned table that have eagerly created partitions for, so as of today for example, I may do a query where year = 2013 and month = 3 which includes empty partitions.
> In the course of investigation I downloaded the sequence files to confirm they were ok. Once I realized that processing of empty partitions was to blame, I am able to work around the issue by bounding my queries to populated partitions.
> Can the need for the emptyFile be eliminated in the case where there's already a bunch of splits being processed? Failing that, can the mapper detect the current input is from emptyFile and not call the deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)