You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by t1ny <wb...@gmail.com> on 2014/09/18 17:48:25 UTC
Spark Streaming and ReactiveMongo
Hello all,Spark newbie here.We are trying to use Spark Streaming
(unfortunately stuck on version 0.9.1 of Spark) to stream data out of
MongoDB.ReactiveMongo (http://reactivemongo.org/) is a scala driver that
enables you to stream a MongoDB capped collection (in our case, the
Oplog).Given that MongoDB isn't supported natively as a Spark Streaming
source, how would you go about implementing a custom Spark Streaming
receiver that uses the ReactiveMongo driver ?In particular, how would you
circumvent the fact that the ReactiveMongo driver isn't serializable ?Thanks
!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-ReactiveMongo-tp14568.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming and ReactiveMongo
Posted by Soumitra Kumar <ku...@gmail.com>.
onStart should be non-blocking. You may try to create a thread in onStart instead.
----- Original Message -----
From: "t1ny" <wb...@gmail.com>
To: user@spark.incubator.apache.org
Sent: Friday, September 19, 2014 1:26:42 AM
Subject: Re: Spark Streaming and ReactiveMongo
Here's what we've tried so far as a first example of a custom Mongo receiver
:
/class MongoStreamReceiver(host: String)
extends NetworkReceiver[String] {
protected lazy val blocksGenerator: BlockGenerator =
new BlockGenerator(StorageLevel.MEMORY_AND_DISK_SER_2)
protected def onStart() = {
blocksGenerator.start()
val driver = new MongoDriver
val connection = driver.connection(List("m01-pdp2"))
val db = connection.db("local")
val collection = db.collection[BSONCollection]("oplog.rs")
val query = BSONDocument("op" -> "i")
val enumerator =
collection.
find(query).
options(QueryOpts().tailable.awaitData).
cursor[BSONDocument].
enumerate()
val processor: Iteratee[BSONDocument, Unit] =
Iteratee.foreach { doc =>
blocksGenerator += BSONDocument.pretty(doc)
}
enumerator |>>> processor
}
protected def onStop() {
blocksGenerator.stop()
}
}
/
However this code doesn't run, probably because of serialization issues (no
logs to confirm this though, just no data in the stream...)
Note that if we comment out the ReactiveMongo-related code and put something
like this instead, the code runs fine :
/ for (i <- 0 until 1000) {
blocksGenerator += "hello world"
Thread.sleep(1000)
}
/
The Java socket example (found here
<http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html> )
works fine as well.
Any hints ?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-ReactiveMongo-tp14568p14661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark Streaming and ReactiveMongo
Posted by t1ny <wb...@gmail.com>.
Here's what we've tried so far as a first example of a custom Mongo receiver
:
/class MongoStreamReceiver(host: String)
extends NetworkReceiver[String] {
protected lazy val blocksGenerator: BlockGenerator =
new BlockGenerator(StorageLevel.MEMORY_AND_DISK_SER_2)
protected def onStart() = {
blocksGenerator.start()
val driver = new MongoDriver
val connection = driver.connection(List("m01-pdp2"))
val db = connection.db("local")
val collection = db.collection[BSONCollection]("oplog.rs")
val query = BSONDocument("op" -> "i")
val enumerator =
collection.
find(query).
options(QueryOpts().tailable.awaitData).
cursor[BSONDocument].
enumerate()
val processor: Iteratee[BSONDocument, Unit] =
Iteratee.foreach { doc =>
blocksGenerator += BSONDocument.pretty(doc)
}
enumerator |>>> processor
}
protected def onStop() {
blocksGenerator.stop()
}
}
/
However this code doesn't run, probably because of serialization issues (no
logs to confirm this though, just no data in the stream...)
Note that if we comment out the ReactiveMongo-related code and put something
like this instead, the code runs fine :
/ for (i <- 0 until 1000) {
blocksGenerator += "hello world"
Thread.sleep(1000)
}
/
The Java socket example (found here
<http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html> )
works fine as well.
Any hints ?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-ReactiveMongo-tp14568p14661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org