You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mohit Anchlia <mo...@gmail.com> on 2015/08/13 19:49:28 UTC

Spark RuntimeException hadoop output format

I have this call trying to save to hdfs 2.6

wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");

but I am getting the following:
java.lang.RuntimeException: class scala.runtime.Nothing$ not
org.apache.hadoop.mapreduce.OutputFormat

Re: Spark RuntimeException hadoop output format

Posted by Ted Yu <yu...@gmail.com>.

First you create the file:

    final File outputFile = new File(outputPath);

Then you write to it:
        Files.append(counts + "\n", outputFile, Charset.defaultCharset());

Cheers

On Fri, Aug 14, 2015 at 4:38 PM, Mohit Anchlia <mo...@gmail.com>
wrote:

> I thought prefix meant the output path? What's the purpose of prefix and
> where do I specify the path if not in prefix?
>
> On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please take a look at JavaPairDStream.scala:
>>  def saveAsHadoopFiles[F <: OutputFormat[_, _]](
>>       prefix: String,
>>       suffix: String,
>>       keyClass: Class[_],
>>       valueClass: Class[_],
>>       outputFormatClass: Class[F]) {
>>
>> Did you intend to use outputPath as prefix ?
>>
>> Cheers
>>
>>
>> On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>
>>> Spark 1.3
>>>
>>> Code:
>>>
>>> wordCounts.foreachRDD(*new* *Function2<JavaPairRDD<String, Integer>,
>>> Time, Void>()* {
>>>
>>> @Override
>>>
>>> *public* Void call(JavaPairRDD<String, Integer> rdd, Time time) *throws*
>>> IOException {
>>>
>>> String counts = "Counts at time " + time + " " + rdd.collect();
>>>
>>> System.*out*.println(counts);
>>>
>>> System.*out*.println("Appending to " + outputFile.getAbsolutePath());
>>>
>>> Files.*append*(counts + "\n", outputFile, Charset.*defaultCharset*());
>>>
>>> *return* *null*;
>>>
>>> }
>>>
>>> });
>>>
>>> wordCounts.saveAsHadoopFiles(outputPath, "txt", Text.*class*, Text.
>>> *class*, TextOutputFormat.*class*);
>>>
>>>
>>> What do I need to check in namenode? I see 0 bytes files like this:
>>>
>>>
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495124000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495125000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495126000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495127000.txt
>>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>>> /tmp/out-1439495128000.txt
>>>
>>>
>>>
>>> However, I also wrote data to a local file on the local file system for
>>> verification and I see the data:
>>>
>>>
>>> $ ls -ltr !$
>>> ls -ltr /tmp/out
>>> -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out
>>>
>>>
>>> On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Which Spark release are you using ?
>>>>
>>>> Can you show us snippet of your code ?
>>>>
>>>> Have you checked namenode log ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mo...@gmail.com>
>>>> wrote:
>>>>
>>>> I was able to get this working by using an alternative method however I
>>>> only see 0 bytes files in hadoop. I've verified that the output does exist
>>>> in the logs however it's missing from hdfs.
>>>>
>>>> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mohitanchlia@gmail.com
>>>> > wrote:
>>>>
>>>>> I have this call trying to save to hdfs 2.6
>>>>>
>>>>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>>>>>
>>>>> but I am getting the following:
>>>>> java.lang.RuntimeException: class scala.runtime.Nothing$ not
>>>>> org.apache.hadoop.mapreduce.OutputFormat
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark RuntimeException hadoop output format

Posted by Mohit Anchlia <mo...@gmail.com>.

I thought prefix meant the output path? What's the purpose of prefix and
where do I specify the path if not in prefix?

On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at JavaPairDStream.scala:
>  def saveAsHadoopFiles[F <: OutputFormat[_, _]](
>       prefix: String,
>       suffix: String,
>       keyClass: Class[_],
>       valueClass: Class[_],
>       outputFormatClass: Class[F]) {
>
> Did you intend to use outputPath as prefix ?
>
> Cheers
>
>
> On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>
>> Spark 1.3
>>
>> Code:
>>
>> wordCounts.foreachRDD(*new* *Function2<JavaPairRDD<String, Integer>,
>> Time, Void>()* {
>>
>> @Override
>>
>> *public* Void call(JavaPairRDD<String, Integer> rdd, Time time) *throws*
>> IOException {
>>
>> String counts = "Counts at time " + time + " " + rdd.collect();
>>
>> System.*out*.println(counts);
>>
>> System.*out*.println("Appending to " + outputFile.getAbsolutePath());
>>
>> Files.*append*(counts + "\n", outputFile, Charset.*defaultCharset*());
>>
>> *return* *null*;
>>
>> }
>>
>> });
>>
>> wordCounts.saveAsHadoopFiles(outputPath, "txt", Text.*class*, Text.
>> *class*, TextOutputFormat.*class*);
>>
>>
>> What do I need to check in namenode? I see 0 bytes files like this:
>>
>>
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495124000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495125000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495126000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495127000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495128000.txt
>>
>>
>>
>> However, I also wrote data to a local file on the local file system for
>> verification and I see the data:
>>
>>
>> $ ls -ltr !$
>> ls -ltr /tmp/out
>> -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out
>>
>>
>> On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Which Spark release are you using ?
>>>
>>> Can you show us snippet of your code ?
>>>
>>> Have you checked namenode log ?
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mo...@gmail.com>
>>> wrote:
>>>
>>> I was able to get this working by using an alternative method however I
>>> only see 0 bytes files in hadoop. I've verified that the output does exist
>>> in the logs however it's missing from hdfs.
>>>
>>> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mo...@gmail.com>
>>> wrote:
>>>
>>>> I have this call trying to save to hdfs 2.6
>>>>
>>>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>>>>
>>>> but I am getting the following:
>>>> java.lang.RuntimeException: class scala.runtime.Nothing$ not
>>>> org.apache.hadoop.mapreduce.OutputFormat
>>>>
>>>
>>>
>>
>

Re: Spark RuntimeException hadoop output format

Posted by Ted Yu <yu...@gmail.com>.

Please take a look at JavaPairDStream.scala:
 def saveAsHadoopFiles[F <: OutputFormat[_, _]](
      prefix: String,
      suffix: String,
      keyClass: Class[_],
      valueClass: Class[_],
      outputFormatClass: Class[F]) {

Did you intend to use outputPath as prefix ?

Cheers


On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia <mo...@gmail.com>
wrote:

> Spark 1.3
>
> Code:
>
> wordCounts.foreachRDD(*new* *Function2<JavaPairRDD<String, Integer>,
> Time, Void>()* {
>
> @Override
>
> *public* Void call(JavaPairRDD<String, Integer> rdd, Time time) *throws*
> IOException {
>
> String counts = "Counts at time " + time + " " + rdd.collect();
>
> System.*out*.println(counts);
>
> System.*out*.println("Appending to " + outputFile.getAbsolutePath());
>
> Files.*append*(counts + "\n", outputFile, Charset.*defaultCharset*());
>
> *return* *null*;
>
> }
>
> });
>
> wordCounts.saveAsHadoopFiles(outputPath, "txt", Text.*class*, Text.*class*,
> TextOutputFormat.*class*);
>
>
> What do I need to check in namenode? I see 0 bytes files like this:
>
>
> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
> /tmp/out-1439495124000.txt
> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
> /tmp/out-1439495125000.txt
> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
> /tmp/out-1439495126000.txt
> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
> /tmp/out-1439495127000.txt
> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
> /tmp/out-1439495128000.txt
>
>
>
> However, I also wrote data to a local file on the local file system for
> verification and I see the data:
>
>
> $ ls -ltr !$
> ls -ltr /tmp/out
> -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out
>
>
> On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Which Spark release are you using ?
>>
>> Can you show us snippet of your code ?
>>
>> Have you checked namenode log ?
>>
>> Thanks
>>
>>
>>
>> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>
>> I was able to get this working by using an alternative method however I
>> only see 0 bytes files in hadoop. I've verified that the output does exist
>> in the logs however it's missing from hdfs.
>>
>> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>
>>> I have this call trying to save to hdfs 2.6
>>>
>>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>>>
>>> but I am getting the following:
>>> java.lang.RuntimeException: class scala.runtime.Nothing$ not
>>> org.apache.hadoop.mapreduce.OutputFormat
>>>
>>
>>
>

Re: Spark RuntimeException hadoop output format

Posted by Mohit Anchlia <mo...@gmail.com>.

Spark 1.3

Code:

wordCounts.foreachRDD(*new* *Function2<JavaPairRDD<String, Integer>, Time,
Void>()* {

@Override

*public* Void call(JavaPairRDD<String, Integer> rdd, Time time) *throws*
IOException {

String counts = "Counts at time " + time + " " + rdd.collect();

System.*out*.println(counts);

System.*out*.println("Appending to " + outputFile.getAbsolutePath());

Files.*append*(counts + "\n", outputFile, Charset.*defaultCharset*());

*return* *null*;

}

});

wordCounts.saveAsHadoopFiles(outputPath, "txt", Text.*class*, Text.*class*,
TextOutputFormat.*class*);


What do I need to check in namenode? I see 0 bytes files like this:


drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
/tmp/out-1439495124000.txt
drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
/tmp/out-1439495125000.txt
drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
/tmp/out-1439495126000.txt
drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
/tmp/out-1439495127000.txt
drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
/tmp/out-1439495128000.txt



However, I also wrote data to a local file on the local file system for
verification and I see the data:


$ ls -ltr !$
ls -ltr /tmp/out
-rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out


On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu <yu...@gmail.com> wrote:

> Which Spark release are you using ?
>
> Can you show us snippet of your code ?
>
> Have you checked namenode log ?
>
> Thanks
>
>
>
> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>
> I was able to get this working by using an alternative method however I
> only see 0 bytes files in hadoop. I've verified that the output does exist
> in the logs however it's missing from hdfs.
>
> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>
>> I have this call trying to save to hdfs 2.6
>>
>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>>
>> but I am getting the following:
>> java.lang.RuntimeException: class scala.runtime.Nothing$ not
>> org.apache.hadoop.mapreduce.OutputFormat
>>
>
>

Re: Spark RuntimeException hadoop output format

Posted by Ted Yu <yu...@gmail.com>.

Which Spark release are you using ?

Can you show us snippet of your code ?

Have you checked namenode log ?

Thanks



> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mo...@gmail.com> wrote:
> 
> I was able to get this working by using an alternative method however I only see 0 bytes files in hadoop. I've verified that the output does exist in the logs however it's missing from hdfs.
> 
>> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mo...@gmail.com> wrote:
>> I have this call trying to save to hdfs 2.6
>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>> 
>> but I am getting the following:
>> 
>> java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
>

Re: Spark RuntimeException hadoop output format

Posted by Mohit Anchlia <mo...@gmail.com>.

I was able to get this working by using an alternative method however I
only see 0 bytes files in hadoop. I've verified that the output does exist
in the logs however it's missing from hdfs.

On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mo...@gmail.com>
wrote:

> I have this call trying to save to hdfs 2.6
>
> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>
> but I am getting the following:
> java.lang.RuntimeException: class scala.runtime.Nothing$ not
> org.apache.hadoop.mapreduce.OutputFormat
>