You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Joe Mudd (JIRA)" <ji...@apache.org> on 2015/01/27 22:10:34 UTC

[jira] [Created] (SPARK-5435) saveAsNewAPIHadoopDataset is not setting up the local configuration

Joe Mudd created SPARK-5435:
-------------------------------

             Summary: saveAsNewAPIHadoopDataset is not setting up the local configuration
                 Key: SPARK-5435
                 URL: https://issues.apache.org/jira/browse/SPARK-5435
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 1.2.0
         Environment: Cloudera 5.3.0
            Reporter: Joe Mudd


The HCatOutputFormat utilizes FileOutpuFormatContainer which refers to the MRv1 FileOutputFormat.getUniqueName() method.  Since the local configuration has not been set up, getUniqueName() ends up throwing an IllegalArgumentException.

It appears the saveAsNewAPIHadoopDataset().writeshard method needs to record Job information in the local Hadoop configuration similar to HadoopRDD.addLocalConfiguration().  In a test build, I ended up setting both the MRv1 and MRv2 names since just having the MRv2 names did not work.

Here's the traceback:

java.lang.IllegalArgumentException: This method can only be called from within a Job
	at org.apache.hadoop.mapred.FileOutputFormat.getUniqueName(FileOutputFormat.java:286)
	at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:101)
	at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:984)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
	at org.apache.spark.scheduler.Task.run(Task.scala:56)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org