You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "David Chen (JIRA)" <ji...@apache.org> on 2014/04/02 02:55:16 UTC
[jira] [Commented] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

    [ https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957201#comment-13957201 ] 

David Chen commented on HIVE-4329:
----------------------------------

I think the correct fix for this is that HCatalog should be calling the {{OutputFormat}}s' {{getHiveRecordWriter}} rather than {{getRecordWriter}}. Since the purpose of HCatalog is to provide read and write interfaces and the Hive Metastore's services to non-Hive clients, existing SerDes should work out of the box.

Fixing it this way will also allow other SerDes, such as Parquet, to work with HCatalog as well since the ParquetSerDe currently has the same problem.

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> -------------------------------------------------------------------
>
>                 Key: HIVE-4329
>                 URL: https://issues.apache.org/jira/browse/HIVE-4329
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Serializers/Deserializers
>    Affects Versions: 0.10.0
>         Environment: discovered in Pig, but it looks like the root cause impacts all non-Hive users
>            Reporter: Sean Busbey
>            Assignee: David Chen
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.io.LongWritable
> 	at org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
> 	at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
> 	at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
> 	at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
> 	at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
> 	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's signature mandates a LongWritable key and HCat's FileRecordWriterContainer forces a NullWritable. I'm not sure of a general fix, other than redefining HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also be changed, since it's ignoring the key. That way fixing things so FileRecordWriterContainer can always use NullWritable could get spun into a different issue?
> The underlying cause for failure to write to AvroSerde tables is that AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)