You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Zhang Xiaoyu <zh...@gmail.com> on 2013/11/20 03:11:22 UTC

generating ORC file as output of a mapreduce job

Hi,
I am writing a MR job to generate data for Hive.

the code generates output with Text format pretty OK

job.setOutputKeyClass(NullWritable.class);

job.setOutputValueClass(Text.class);


But when I change the value class from Text.class to OrcOutputFormat.class,
it throw exception


2013-11-20 00:50:50,613 FATAL [main]
org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.VerifyError: class
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcRequestHeaderProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	at org.apache.hadoop.util.ProtoUtil.makeRpcRequestHeader(ProtoUtil.java:165)
	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:362)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1389)
	at org.apache.hadoop.ipc.Client.call(Client.java:1318)
	at org.apache.hadoop.ipc.Client.call(Client.java:1300)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
	at sun.proxy.$Proxy6.getTask(Unknown Source)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:133)





My objective is generating ORC file as output a MR job, so that I can
load data into Hive directly. If other approach also serve the same
objective, that will be nice. Is there any HCatlog utility I can use
do it ?


Thanks a lot,

Johnny