You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by t1ny <wb...@gmail.com> on 2014/09/18 17:48:25 UTC

Spark Streaming and ReactiveMongo

Hello all,Spark newbie here.We are trying to use Spark Streaming
(unfortunately stuck on version 0.9.1 of Spark) to stream data out of
MongoDB.ReactiveMongo (http://reactivemongo.org/) is a scala driver that
enables you to stream a MongoDB capped collection (in our case, the
Oplog).Given that MongoDB isn't supported natively as a Spark Streaming
source, how would you go about implementing a custom Spark Streaming
receiver that uses the ReactiveMongo driver ?In particular, how would you
circumvent the fact that the ReactiveMongo driver isn't serializable ?Thanks
!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-ReactiveMongo-tp14568.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming and ReactiveMongo

Posted by Soumitra Kumar <ku...@gmail.com>.
onStart should be non-blocking. You may try to create a thread in onStart instead.

----- Original Message -----
From: "t1ny" <wb...@gmail.com>
To: user@spark.incubator.apache.org
Sent: Friday, September 19, 2014 1:26:42 AM
Subject: Re: Spark Streaming and ReactiveMongo

Here's what we've tried so far as a first example of a custom Mongo receiver
:

/class MongoStreamReceiver(host: String)
  extends NetworkReceiver[String] {

  protected lazy val blocksGenerator: BlockGenerator =
    new BlockGenerator(StorageLevel.MEMORY_AND_DISK_SER_2)

  protected def onStart() = {
    blocksGenerator.start()

    val driver = new MongoDriver
    val connection = driver.connection(List("m01-pdp2"))
    val db = connection.db("local")
    val collection = db.collection[BSONCollection]("oplog.rs")

    val query = BSONDocument("op" -> "i")

    val enumerator =
      collection.
        find(query).
        options(QueryOpts().tailable.awaitData).
        cursor[BSONDocument].
        enumerate()

    val processor: Iteratee[BSONDocument, Unit] =
      Iteratee.foreach { doc =>
        blocksGenerator += BSONDocument.pretty(doc)
      }

    enumerator |>>> processor
  }

  protected def onStop() {
    blocksGenerator.stop()
  }
}
/
However this code doesn't run, probably because of serialization issues (no
logs to confirm this though, just no data in the stream...)

Note that if we comment out the ReactiveMongo-related code and put something
like this instead, the code runs fine :
/    for (i <- 0 until 1000) {
      blocksGenerator += "hello world"
      Thread.sleep(1000)
    }
/
The Java socket example (found  here
<http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html>  )
works fine as well.

Any hints ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-ReactiveMongo-tp14568p14661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming and ReactiveMongo

Posted by t1ny <wb...@gmail.com>.
Here's what we've tried so far as a first example of a custom Mongo receiver
:

/class MongoStreamReceiver(host: String)
  extends NetworkReceiver[String] {

  protected lazy val blocksGenerator: BlockGenerator =
    new BlockGenerator(StorageLevel.MEMORY_AND_DISK_SER_2)

  protected def onStart() = {
    blocksGenerator.start()

    val driver = new MongoDriver
    val connection = driver.connection(List("m01-pdp2"))
    val db = connection.db("local")
    val collection = db.collection[BSONCollection]("oplog.rs")

    val query = BSONDocument("op" -> "i")

    val enumerator =
      collection.
        find(query).
        options(QueryOpts().tailable.awaitData).
        cursor[BSONDocument].
        enumerate()

    val processor: Iteratee[BSONDocument, Unit] =
      Iteratee.foreach { doc =>
        blocksGenerator += BSONDocument.pretty(doc)
      }

    enumerator |>>> processor
  }

  protected def onStop() {
    blocksGenerator.stop()
  }
}
/
However this code doesn't run, probably because of serialization issues (no
logs to confirm this though, just no data in the stream...)

Note that if we comment out the ReactiveMongo-related code and put something
like this instead, the code runs fine :
/    for (i <- 0 until 1000) {
      blocksGenerator += "hello world"
      Thread.sleep(1000)
    }
/
The Java socket example (found  here
<http://spark.apache.org/docs/0.9.1/streaming-custom-receivers.html>  )
works fine as well.

Any hints ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-ReactiveMongo-tp14568p14661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org