You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arpan Ghosh <ar...@automatic.com> on 2014/12/31 01:00:38 UTC

Trouble using MultipleTextOutputFormat with Spark

Hi,

I am trying to use the MultipleTextOutputFormat to rename the output files
of my Spark job something different from the default "part-NNNNN".

I have implemented a custom MultipleTextOutputFormat class as follows:


*class DriveOutputRenameMultipleTextOutputFormat extends
MultipleTextOutputFormat[String, Any] {*
*  override def generateFileNameForKeyValue(key : String, value : Any,
name: String) : String = {*
*    "DRIVE" + "-" + name.split("-")(1) + ".csv"*
*  }*
*}*

When I call the saveAsHadoopFile() function on a RDD[K,V], I get the
following error:

*sc.textFile("/mnt/raw/drive/2014/10/29/part-00000").map(x => (x,
null)).saveAsHadoopFile("/mnt/test", classOf[String], classOf[Any],
classOf[DriveOutputRenameMultipleTextOutputFormat])*

java.lang.RuntimeException: java.lang.NoSuchMethodException:
line210ee86336174025bcee4914212e1ff6168.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$DriveOutputRenameMultipleTextOutputFormat.<init>()
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.mapred.JobConf.getOutputFormat(JobConf.java:619) at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1001)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:931)
Caused by: java.lang.NoSuchMethodException:
line210ee86336174025bcee4914212e1ff6168.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$DriveOutputRenameMultipleTextOutputFormat.<init>()
at java.lang.Class.getConstructor0(Class.java:2892) at
java.lang.Class.getDeclaredConstructor(Class.java:2058) at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
at org.apache.hadoop.mapred.JobConf.getOutputFormat(JobConf.java:619) at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1001)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:931)