You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joe Mudd (JIRA)" <ji...@apache.org> on 2015/01/27 22:10:34 UTC
[jira] [Created] (SPARK-5435) saveAsNewAPIHadoopDataset is not
setting up the local configuration
Joe Mudd created SPARK-5435:
-------------------------------
Summary: saveAsNewAPIHadoopDataset is not setting up the local configuration
Key: SPARK-5435
URL: https://issues.apache.org/jira/browse/SPARK-5435
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 1.2.0
Environment: Cloudera 5.3.0
Reporter: Joe Mudd
The HCatOutputFormat utilizes FileOutpuFormatContainer which refers to the MRv1 FileOutputFormat.getUniqueName() method. Since the local configuration has not been set up, getUniqueName() ends up throwing an IllegalArgumentException.
It appears the saveAsNewAPIHadoopDataset().writeshard method needs to record Job information in the local Hadoop configuration similar to HadoopRDD.addLocalConfiguration(). In a test build, I ended up setting both the MRv1 and MRv2 names since just having the MRv2 names did not work.
Here's the traceback:
java.lang.IllegalArgumentException: This method can only be called from within a Job
at org.apache.hadoop.mapred.FileOutputFormat.getUniqueName(FileOutputFormat.java:286)
at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:101)
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:984)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org