You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@toree.apache.org by "Phil Berkland (JIRA)" <ji...@apache.org> on 2016/11/01 20:08:58 UTC
[jira] [Commented] (TOREE-351) Kafka Direct Streaming does not work
in torre
[ https://issues.apache.org/jira/browse/TOREE-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626526#comment-15626526 ]
Phil Berkland commented on TOREE-351:
-------------------------------------
possibly related - https://issues.scala-lang.org/browse/SI-9587
> Kafka Direct Streaming does not work in torre
> ---------------------------------------------
>
> Key: TOREE-351
> URL: https://issues.apache.org/jira/browse/TOREE-351
> Project: TOREE
> Issue Type: Bug
> Affects Versions: 0.1.0
> Environment: spark 2.0, toree built from master branch,
> in kernel.json added "SPARK_OPTS": "--jars file:/path/to/spark-streaming-kafka-0-10-assembly_2.11-2.0.0.jar" since "%addjar" did not seem to work.
>
> Reporter: Phil Berkland
>
> The following code with throw a CastCastException when running in a notebook:
> -------------
> import org.apache.kafka.clients.consumer.ConsumerRecord
> import org.apache.kafka.common.serialization.StringDeserializer
> import org.apache.spark.SparkConf
> import org.apache.spark.streaming._
> import org.apache.spark.streaming.kafka010._
> import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
> import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
> val kafkaParams = Map[String, Object](
> "bootstrap.servers" -> "localhost:9092,anotherhost:9092",
> "key.deserializer" -> classOf[StringDeserializer],
> "value.deserializer" -> classOf[StringDeserializer],
> "group.id" -> "example",
> "auto.offset.reset" -> "latest",
> "enable.auto.commit" -> (false: java.lang.Boolean)
> )
> val streamingContext = new StreamingContext(sc, Seconds(1))
> val topics = Array("topicA", "topicB")
> val stream = KafkaUtils.createDirectStream[String, String](
> streamingContext,
> PreferConsistent,
> Subscribe[String, String](topics, kafkaParams)
> )
> var rdd=stream.map(record => (record.key, record.value))
> rdd.print
> streamingContext.start
> streamingContext.awaitTermination
> ---------------
> org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
> at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:702)
> at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:557)
> at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:540)
> at org.apache.spark.streaming.kafka010.Subscribe.onStart(ConsumerStrategy.scala:83)
> at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.consumer(DirectKafkaInputDStream.scala:74)
> at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:225)
> at org.apache.spark.streaming.DStreamGraph$$anonfun$start$5.apply(DStreamGraph.scala:49)
> at org.apache.spark.streaming.DStreamGraph$$anonfun$start$5.apply(DStreamGraph.scala:49)
> at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach_quick(ParArray.scala:143)
> at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:136)
> at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
> at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
> at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
> at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
> at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
> at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
> at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
> at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
> at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> at ... run in separate thread using org.apache.spark.util.ThreadUtils ... ()
> at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:578)
> at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)
> ... 44 elided
> Caused by: java.lang.ClassCastException: class org.apache.kafka.clients.consumer.RangeAssignor
> at java.lang.Class.asSubclass(Class.java:3404)
> at org.apache.kafka.common.utils.Utils.newInstance(Utils.java:332)
> at org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:225)
> at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:637)
> at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:557)
> at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:540)
> at org.apache.spark.streaming.kafka010.Subscribe.onStart(ConsumerStrategy.scala:83)
> at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.consumer(DirectKafkaInputDStream.scala:74)
> at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:225)
> at org.apache.spark.streaming.DStreamGraph$$anonfun$start$5.apply(DStreamGraph.scala:49)
> at org.apache.spark.streaming.DStreamGraph$$anonfun$start$5.apply(DStreamGraph.scala:49)
> at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach_quick(ParArray.scala:143)
> at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:136)
> at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
> at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
> at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
> at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
> at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
> at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
> at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
> at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
> at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> --------------------
> This same code works ok when running inside spark-shell.
> Kafka is using Thread.currentThread().getContextClassLoader() to load a class, but the class loader is (incorrectly) different than the original loader.
> This appears to be the same root problem as TOREE-349.
> Note that all thread management is being done by Kafka, so we have no control over threading or setting contextClassLoader in them.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)