You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Du Li <li...@yahoo-inc.com.INVALID> on 2014/09/13 02:47:44 UTC

NullWritable not serializable

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable.

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

RE: NullWritable not serializable

Posted by "Yan Zhou.sc" <Ya...@huawei.com>.

There appears to be a newly added Boolean in DAGScheduler default to "False":

private val localExecutionEnabled = sc.getConf.getBoolean("spark.localExecution.enabled", false)

Then

val shouldRunLocally =
        localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1


I'm wondering whether by default "running locally" is disabled.

Yan

From: Du Li [mailto:lidu@yahoo-inc.com.INVALID]
Sent: Tuesday, September 16, 2014 12:26 PM
To: Matei Zaharia
Cc: user@spark.apache.org; dev@spark.apache.org
Subject: Re: NullWritable not serializable

Hi,

The test case is separated out as follows. The call to rdd2.first() breaks when spark version is changed to 1.1.0, reporting exception NullWritable not serializable. However, the same test passed with spark 1.0.2. The pom.xml file is attached. The test data README.md was copied from spark.

Thanks,
Du
-----

package com.company.project.test

import org.scalatest._

class WritableTestSuite extends FunSuite {
  test("generated sequence file should be readable from spark") {
    import org.apache.hadoop.io.{NullWritable, Text}
    import org.apache.spark.{SparkContext, SparkConf}
    import org.apache.spark.SparkContext._

    val conf = new SparkConf(false).setMaster("local").setAppName("test data exchange with spark")
    val sc = new SparkContext(conf)

    val rdd = sc.textFile("README.md")
    val res = rdd.map(x => (NullWritable.get(), new Text(x)))
    res.saveAsSequenceFile("./test_data")

    val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])

    assert(rdd.first == rdd2.first._2.toString)
  }
}



From: Matei Zaharia <ma...@gmail.com>>
Date: Monday, September 15, 2014 at 10:52 PM
To: Du Li <li...@yahoo-inc.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable

Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally). But generally speaking these operations should *not* work over a network, so you'll have to make sure that you only send serializable types through shuffles or collects, or use a serialization framework like Kryo that might be okay with Writables.

Matei


On September 15, 2014 at 9:13:13 PM, Du Li (lidu@yahoo-inc.com<ma...@yahoo-inc.com>) wrote:
Hi Matei,

Thanks for your reply.

The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn't pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file.

My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do.

Do you recall any difference between spark 1.0 and 1.1 that might cause this problem?

Thanks,
Du


From: Matei Zaharia <ma...@gmail.com>>
Date: Friday, September 12, 2014 at 9:10 PM
To: Du Li <li...@yahoo-inc.com.invalid>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable


Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie


On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid<ma...@yahoo-inc.com.invalid>) wrote:
Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable.

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Du Li <li...@yahoo-inc.com.INVALID>.

Hi,

The test case is separated out as follows. The call to rdd2.first() breaks when spark version is changed to 1.1.0, reporting exception NullWritable not serializable. However, the same test passed with spark 1.0.2. The pom.xml file is attached. The test data README.md was copied from spark.

Thanks,
Du
-----

package com.company.project.test

import org.scalatest._

class WritableTestSuite extends FunSuite {
  test("generated sequence file should be readable from spark") {
    import org.apache.hadoop.io.{NullWritable, Text}
    import org.apache.spark.{SparkContext, SparkConf}
    import org.apache.spark.SparkContext._

    val conf = new SparkConf(false).setMaster("local").setAppName("test data exchange with spark")
    val sc = new SparkContext(conf)

    val rdd = sc.textFile("README.md")
    val res = rdd.map(x => (NullWritable.get(), new Text(x)))
    res.saveAsSequenceFile("./test_data")

    val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])

    assert(rdd.first == rdd2.first._2.toString)
  }
}



From: Matei Zaharia <ma...@gmail.com>>
Date: Monday, September 15, 2014 at 10:52 PM
To: Du Li <li...@yahoo-inc.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable

Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally). But generally speaking these operations should *not* work over a network, so you'll have to make sure that you only send serializable types through shuffles or collects, or use a serialization framework like Kryo that might be okay with Writables.

Matei


On September 15, 2014 at 9:13:13 PM, Du Li (lidu@yahoo-inc.com<ma...@yahoo-inc.com>) wrote:

Hi Matei,

Thanks for your reply.

The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn�t pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file.

My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do.

Do you recall any difference between spark 1.0 and 1.1 that might cause this problem?

Thanks,
Du


From: Matei Zaharia <ma...@gmail.com>>
Date: Friday, September 12, 2014 at 9:10 PM
To: Du Li <li...@yahoo-inc.com.invalid>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie


On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid<ma...@yahoo-inc.com.invalid>) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable.

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Du Li <li...@yahoo-inc.com.INVALID>.

Hi,

The test case is separated out as follows. The call to rdd2.first() breaks when spark version is changed to 1.1.0, reporting exception NullWritable not serializable. However, the same test passed with spark 1.0.2. The pom.xml file is attached. The test data README.md was copied from spark.

Thanks,
Du
-----

package com.company.project.test

import org.scalatest._

class WritableTestSuite extends FunSuite {
  test("generated sequence file should be readable from spark") {
    import org.apache.hadoop.io.{NullWritable, Text}
    import org.apache.spark.{SparkContext, SparkConf}
    import org.apache.spark.SparkContext._

    val conf = new SparkConf(false).setMaster("local").setAppName("test data exchange with spark")
    val sc = new SparkContext(conf)

    val rdd = sc.textFile("README.md")
    val res = rdd.map(x => (NullWritable.get(), new Text(x)))
    res.saveAsSequenceFile("./test_data")

    val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])

    assert(rdd.first == rdd2.first._2.toString)
  }
}



From: Matei Zaharia <ma...@gmail.com>>
Date: Monday, September 15, 2014 at 10:52 PM
To: Du Li <li...@yahoo-inc.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable

Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally). But generally speaking these operations should *not* work over a network, so you'll have to make sure that you only send serializable types through shuffles or collects, or use a serialization framework like Kryo that might be okay with Writables.

Matei


On September 15, 2014 at 9:13:13 PM, Du Li (lidu@yahoo-inc.com<ma...@yahoo-inc.com>) wrote:

Hi Matei,

Thanks for your reply.

The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn’t pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file.

My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do.

Do you recall any difference between spark 1.0 and 1.1 that might cause this problem?

Thanks,
Du


From: Matei Zaharia <ma...@gmail.com>>
Date: Friday, September 12, 2014 at 9:10 PM
To: Du Li <li...@yahoo-inc.com.invalid>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie


On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid<ma...@yahoo-inc.com.invalid>) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable.

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Matei Zaharia <ma...@gmail.com>.

Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally). But generally speaking these operations should *not* work over a network, so you'll have to make sure that you only send serializable types through shuffles or collects, or use a serialization framework like Kryo that might be okay with Writables.

Matei

On September 15, 2014 at 9:13:13 PM, Du Li (lidu@yahoo-inc.com) wrote:

Hi Matei,

Thanks for your reply. 

The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn’t pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file.

My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do.

Do you recall any difference between spark 1.0 and 1.1 that might cause this problem?

Thanks,
Du


From: Matei Zaharia <ma...@gmail.com>
Date: Friday, September 12, 2014 at 9:10 PM
To: Du Li <li...@yahoo-inc.com.invalid>, "user@spark.apache.org" <us...@spark.apache.org>, "dev@spark.apache.org" <de...@spark.apache.org>
Subject: Re: NullWritable not serializable

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie

On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. 

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Matei Zaharia <ma...@gmail.com>.

Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally). But generally speaking these operations should *not* work over a network, so you'll have to make sure that you only send serializable types through shuffles or collects, or use a serialization framework like Kryo that might be okay with Writables.

Matei

On September 15, 2014 at 9:13:13 PM, Du Li (lidu@yahoo-inc.com) wrote:

Hi Matei,

Thanks for your reply. 

The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn’t pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file.

My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do.

Do you recall any difference between spark 1.0 and 1.1 that might cause this problem?

Thanks,
Du


From: Matei Zaharia <ma...@gmail.com>
Date: Friday, September 12, 2014 at 9:10 PM
To: Du Li <li...@yahoo-inc.com.invalid>, "user@spark.apache.org" <us...@spark.apache.org>, "dev@spark.apache.org" <de...@spark.apache.org>
Subject: Re: NullWritable not serializable

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie

On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. 

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Du Li <li...@yahoo-inc.com.INVALID>.

Hi Matei,

Thanks for your reply.

The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn’t pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file.

My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do.

Do you recall any difference between spark 1.0 and 1.1 that might cause this problem?

Thanks,
Du


From: Matei Zaharia <ma...@gmail.com>>
Date: Friday, September 12, 2014 at 9:10 PM
To: Du Li <li...@yahoo-inc.com.invalid>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Subject: Re: NullWritable not serializable

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie


On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid<ma...@yahoo-inc.com.invalid>) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable.

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Matei Zaharia <ma...@gmail.com>.

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie

On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. 

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect

Re: NullWritable not serializable

Posted by Matei Zaharia <ma...@gmail.com>.

Hi Du,

I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String).

Matie

On September 12, 2014 at 8:48:36 PM, Du Li (lidu@yahoo-inc.com.invalid) wrote:

Hi,

I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. 

I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10.

Anybody knows what caused the problem?

Thanks,
Du

----
import org.apache.hadoop.io.{NullWritable, Text}
val rdd = sc.textFile("README.md")
val res = rdd.map(x => (NullWritable.get(), new Text(x)))
res.saveAsSequenceFile("./test_data")
val rdd2 = sc.sequenceFile("./test_data", classOf[NullWritable], classOf[Text])
rdd2.collect
val rdd3 = sc.sequenceFile[NullWritable,Text]("./test_data")
rdd3.collect