You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by SivaRam Bollineni <sh...@gmail.com> on 2017/01/27 06:23:59 UTC

Spark Streaming checkpoint recovery failure for HBaseContext

hi,

First of all, Thanks a ton for HBaseContext API, it's very useful and
performant for our use-cases. We use HBaseContext in Spark Streaming with
checkpoint. We have a problem in recovering from checkpoint. Below is the
*NullPointerException* stack trace. I am sure it's because SparkContext
object is passed as null to HBaseContext. SparkContext is not serializable
and without SparkContext we can't create HBaseContext. Please help me in
resolving this issue.

java.lang.NullPointerException

        at
com.cloudera.spark.hbase.HBaseContext.<init>(HBaseContext.scala:69)

        at
com.abc.bbc.xyz.MySparkStreaming$$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:118)

        at com.abc.bbc.xyz. MySparkStreaming
$$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:94)

        at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)

        at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)

        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)

        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)

        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)

        at
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)

        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)

        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)

        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)

        at scala.util.Try$.apply(Try.scala:161)

        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)

        at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:207)

        at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)

        at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)

        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)

        at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:206)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)

        at java.lang.Thread.run(Unknown Source)

Thanks !
Siva

Re: Spark Streaming checkpoint recovery failure for HBaseContext

Posted by Sean Busbey <bu...@apache.org>.
HBaseContext makes use of broadcast variables, so to use it with
checkpointing in a spark streaming application you have to wrap it in
a singleton that can be recreated on restore from the checkpoint.

I have an example of this I was working on but have not finished for
the hbase-downstreamer project:

https://github.com/busbey/hbase-downstreamer/blob/hbase-spark-example/src/main/scala/org/hbase/downstreamer/spark/RecoverableNetworkWordCountHBase.scala



On Fri, Jan 27, 2017 at 12:23 AM, SivaRam Bollineni
<sh...@gmail.com> wrote:
> hi,
>
> First of all, Thanks a ton for HBaseContext API, it's very useful and
> performant for our use-cases. We use HBaseContext in Spark Streaming with
> checkpoint. We have a problem in recovering from checkpoint. Below is the
> *NullPointerException* stack trace. I am sure it's because SparkContext
> object is passed as null to HBaseContext. SparkContext is not serializable
> and without SparkContext we can't create HBaseContext. Please help me in
> resolving this issue.
>
> java.lang.NullPointerException
>
>         at
> com.cloudera.spark.hbase.HBaseContext.<init>(HBaseContext.scala:69)
>
>         at
> com.abc.bbc.xyz.MySparkStreaming$$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:118)
>
>         at com.abc.bbc.xyz. MySparkStreaming
> $$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:94)
>
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)
>
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
>
>         at scala.util.Try$.apply(Try.scala:161)
>
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:207)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)
>
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:206)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> Thanks !
> Siva

Re: Spark Streaming checkpoint recovery failure for HBaseContext

Posted by Chetan Khatri <ch...@gmail.com>.
The hbase-spark module is not production ready , having some issues there
are TODO: patches - https://issues.apache.org/jira/browse/HBASE-16179
- https://issues.apache.org/jira/browse/HBASE-14160

But by changing according configuration and using build.sbt i got success
to get run it. would you share exact code block how you configured
hbasecontext and pom.xml


On Fri, Jan 27, 2017 at 11:53 AM, SivaRam Bollineni <
shivaram.bollineni@gmail.com> wrote:

> hi,
>
> First of all, Thanks a ton for HBaseContext API, it's very useful and
> performant for our use-cases. We use HBaseContext in Spark Streaming with
> checkpoint. We have a problem in recovering from checkpoint. Below is the
> *NullPointerException* stack trace. I am sure it's because SparkContext
> object is passed as null to HBaseContext. SparkContext is not serializable
> and without SparkContext we can't create HBaseContext. Please help me in
> resolving this issue.
>
> java.lang.NullPointerException
>
>         at
> com.cloudera.spark.hbase.HBaseContext.<init>(HBaseContext.scala:69)
>
>         at
> com.abc.bbc.xyz.MySparkStreaming$$anonfun$createStreamingContext$1.
> apply(MySparkStreaming.scala:118)
>
>         at com.abc.bbc.xyz. MySparkStreaming
> $$anonfun$createStreamingContext$1.apply(MySparkStreaming.scala:94)
>
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$
> foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)
>
>         at
> org.apache.spark.streaming.dstream.DStream$$anonfun$
> foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$
> anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(
> DStream.scala:399)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(
> ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(
> ForEachDStream.scala:40)
>
>         at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(
> ForEachDStream.scala:40)
>
>         at scala.util.Try$.apply(Try.scala:161)
>
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:207)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207)
>
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>
>         at
> org.apache.spark.streaming.scheduler.JobScheduler$
> JobHandler.run(JobScheduler.scala:206)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
> Thanks !
> Siva
>