You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gerard Maas <ge...@gmail.com> on 2014/06/14 22:32:05 UTC

Re: guidance on simple unit testing with Sprk

Ll mlll
On Jun 14, 2014 4:05 AM, "Matei Zaharia" <ma...@gmail.com> wrote:

> You need to factor your program so that it’s not just a main(). This is
> not a Spark-specific issue, it’s about how you’d unit test any program in
> general. In this case, your main() creates a SparkContext, so you can’t
> pass one from outside, and your code has to read data from a file and write
> it to a file. It would be better to move your code for transforming data
> into a new function:
>
> def processData(lines: RDD[String]): RDD[String] = {
>   // build and return your “res” variable
> }
>
> Then you can unit-test this directly on data you create in your program:
>
> val myLines = sc.parallelize(Seq(“line 1”, “line 2”))
> val result = GetInfo.processData(myLines).collect()
> assert(result.toSet === Set(“res 1”, “res 2”))
>
> Matei
>
> On Jun 13, 2014, at 2:42 PM, SK <sk...@gmail.com> wrote:
>
> > Hi,
> >
> > I have looked through some of the  test examples and also the brief
> > documentation on unit testing at
> > http://spark.apache.org/docs/latest/programming-guide.html#unit-testing,
> but
> > still dont have a good understanding of writing unit tests using the
> Spark
> > framework. Previously, I have written unit tests using specs2 framework
> and
> > have got them to work in Scalding.  I tried to use the specs2 framework
> with
> > Spark, but could not find any simple examples I could follow. I am open
> to
> > specs2 or Funsuite, whichever works best with Spark. I would like some
> > additional guidance, or some simple sample code using specs2 or
> Funsuite. My
> > code is provided below.
> >
> >
> > I have the following code in src/main/scala/GetInfo.scala. It reads a
> Json
> > file and extracts some data. It takes the input file (args(0)) and output
> > file (args(1)) as arguments.
> >
> > object GetInfo{
> >
> >   def main(args: Array[String]) {
> >         val inp_file = args(0)
> >         val conf = new SparkConf().setAppName("GetInfo")
> >         val sc = new SparkContext(conf)
> >         val res = sc.textFile(log_file)
> >                   .map(line => { parse(line) })
> >                   .map(json =>
> >                      {
> >                         implicit lazy val formats =
> > org.json4s.DefaultFormats
> >                         val aid = (json \ "d" \ "TypeID").extract[Int]
> >                         val ts = (json \ "d" \ "TimeStamp").extract[Long]
> >                         val gid = (json \ "d" \ "ID").extract[String]
> >                         (aid, ts, gid)
> >                      }
> >                    )
> >                   .groupBy(tup => tup._3)
> >                   .sortByKey(true)
> >                   .map(g => (g._1, g._2.map(_._2).max))
> >         res.map(tuple=> "%s, %d".format(tuple._1,
> > tuple._2)).saveAsTextFile(args(1))
> > }
> >
> >
> > I would like to test the above code. My unit test is in src/test/scala.
> The
> > code I have so far for the unit test appears below:
> >
> > import org.apache.spark._
> > import org.specs2.mutable._
> >
> > class GetInfoTest extends Specification with java.io.Serializable{
> >
> >     val data = List (
> >      ("d": {"TypeID" = 10, "Timestamp": 1234, "ID": "ID1"}),
> >      ("d": {"TypeID" = 11, "Timestamp": 5678, "ID": "ID1"}),
> >      ("d": {"TypeID" = 10, "Timestamp": 1357, "ID": "ID2"}),
> >      ("d": {"TypeID" = 11, "Timestamp": 2468, "ID": "ID2"})
> >    )
> >
> >     val expected_out = List(
> >        ("ID1",5678),
> >        ("ID2",2468),
> >     )
> >
> >    "A GetInfo job" should {
> >             //***** How do I pass "data" define above as input and output
> > which GetInfo expects as arguments? ******
> >             val sc = new SparkContext("local", "GetInfo")
> >
> >             //*** how do I get the output ***
> >
> >              //assuming out_buffer has the output I want to match it to
> the
> > expected output
> >             "match expected output" in {
> >                      ( out_buffer == expected_out) must beTrue
> >             }
> >     }
> >
> > }
> >
> > I would like some help with the tasks marked with "****" in the unit test
> > code above. If specs2 is not the right way to go, I am also open to
> > FunSuite. I would like to know how to pass the input while calling my
> > program from the unit test and get the output.
> >
> > Thanks for your help.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/guidance-on-simple-unit-testing-with-Spark-tp7604.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>