You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2016/05/11 22:55:30 UTC
How to transform a JSON string into a Java HashMap<>
java.io.NotSerializableException
I have a streaming app that receives very complicated JSON (twitter status).
I would like to work with it as a hash map. It would be very difficult to
define a pojo for this JSON. (I can not use twitter4j)
// map json string to map<String, Object>
JavaRDD<Hashtable<String, String>> jsonMapRDD = powerTrackRDD.map(new
Function<String, Hashtable<String,String>>(){
private static final long serialVersionUID = 1L;
@Override
public Hashtable<String, String> call(String line) throws Exception {
//HashMap<String, String> hm = JsonUtils.jsonToHashMap(line);
//Hashtable<String,String> ret = new Hashtable<String, String>(hm);
Hashtable<String, String> ret = null;
return ret;
}});
Using the sqlContext works how ever I assume that this is going to be slow
and error prone given it likely many key/value pairs are missing
DataFrame df = sqlContext.read().json(getFilePath().toString());
df.printSchema();
Any suggestions would be greatly appriciated
Andy
org.apache.spark.SparkException: Task not serializable
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scal
a:304)
at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$
clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:15
0)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:11
1)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.map(RDD.scala:323)
at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:96)
at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46)
at
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream.test(SavePower
TrackActivityStream.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.
java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.j
ava:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.ja
va:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.jav
a:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.jav
a:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.jav
a:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26
)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestRef
erence.java:86)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:3
8)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRu
nner.java:459)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRu
nner.java:675)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.
java:382)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner
.java:192)
Caused by: java.io.NotSerializableException:
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream
Serialization stack:
- object not serializable (class:
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream, value:
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream@f438904)
- field (class:
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream$1, name:
this$0, type: class
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream)
- object (class
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream$1,
com.pws.sparkStreaming.collector.SavePowerTrackActivityStream$1@3fa7df1)
- field (class:
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name:
fun$1, type: interface org.apache.spark.api.java.function.Function)
- object (class
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1,
<function1>)
at
org.apache.spark.serializer.SerializationDebugger$.improveException(Serializ
ationDebugger.scala:40)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializ
er.scala:47)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.
scala:101)
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scal
a:301)
... 37 more
Re: How to transform a JSON string into a Java HashMap<> java.io.NotSerializableException
Posted by Marcelo Vanzin <va...@cloudera.com>.
Is the class mentioned in the exception below the parent class of the
anonymous "Function" class you're creating?
If so, you may need to make it serializable. Or make your function a
proper "standalone" class (either a nested static class or a top-level
one).
On Wed, May 11, 2016 at 3:55 PM, Andy Davidson
<An...@santacruzintegration.com> wrote:
> Caused by: java.io.NotSerializableException:
> com.pws.sparkStreaming.collector.SavePowerTrackActivityStream
--
Marcelo
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org