You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Adam Roberts <AR...@uk.ibm.com> on 2016/06/14 11:18:12 UTC

Databricks SparkPerf with Spark 2.0

Hi, I'm working on having "SparkPerf" (
https://github.com/databricks/spark-perf) run with Spark 2.0, noticed a 
few pull requests not yet accepted so concerned this project's been 
abandoned - it's proven very useful in the past for quality assurance as 
we can easily exercise lots of Spark functions with a cluster (perhaps 
exposing problems that don't surface with the Spark unit tests).

I want to use Scala 2.11.8 and Spark 2.0.0 so I'm making my way through 
various files, currently faced with a NoSuchMethod exception

NoSuchMethodError: 
org/apache/spark/SparkContext.rddToPairRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;Lscala/math/Ordering;)Lorg/apache/spark/rdd/PairRDDFunctions; 
at spark.perf.AggregateByKey.runTest(KVDataTest.scala:137) 

class AggregateByKey(sc: SparkContext) extends KVDataTest(sc) {
  override def runTest(rdd: RDD[_], reduceTasks: Int) {
    rdd.asInstanceOf[RDD[(String, String)]]
      .map{case (k, v) => (k, v.toInt)}.reduceByKey(_ + _, 
reduceTasks).count()
  }
}

Grepping shows
./spark-tests/target/streams/compile/incCompileSetup/$global/streams/inc_compile_2.10:/home/aroberts/Desktop/spark-perf/spark-tests/src/main/scala/spark/perf/KVDataTest.scala 
-> rddToPairRDDFunctions 

The scheduling-throughput tests complete fine but the problem here is seen 
with agg-by-key (and likely other modules to fix owing to API changes 
between 1.x and 2.x which I guess is the cause of the above problem).

Has anybody already made good progress here? Would like to work together 
and get this available for everyone, I'll be churning through it either 
way. Will be looking at HiBench also.

Next step for me is to use sbt -Dspark.version=2.0.0 (2.0.0-preview?) and 
work from there, although I figured the prep tests stage would do this for 
me (how else is it going to build?).

Cheers,




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Databricks SparkPerf with Spark 2.0

Posted by Michael Armbrust <mi...@databricks.com>.

NoSuchMethodError always means that you are compiling against a different
classpath than is available at runtime, so it sounds like you are on the
right track.  The project is not abandoned, we're just busy with the
release.  It would be great if you could open a pull request.

On Tue, Jun 14, 2016 at 4:56 AM, Adam Roberts <AR...@uk.ibm.com> wrote:

> Fixed the below problem, grepped for spark.version, noticed some instances
> of 1.5.2 being declared, changed to 2.0.0-preview in
> spark-tests/project/SparkTestsBuild.scala
>
> Next one to fix is:
> 16/06/14 12:52:44 INFO ContextCleaner: Cleaned shuffle 9
> Exception in thread "main" java.lang.NoSuchMethodError:
> org/json4s/jackson/JsonMethods$.render$default$2(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/Formats;
>
> I'm going to log this and further progress under "Issues" for the project
> itself (probably need to change org.json4s version in
> SparkTestsBuild.scala, now I know this file is super important), so the
> emails here will at least point people there.
>
> Cheers,
>
>
>
>
>
>
>
> From:        Adam Roberts/UK/IBM@IBMGB
> To:        dev <de...@spark.apache.org>
> Date:        14/06/2016 12:18
> Subject:        Databricks SparkPerf with Spark 2.0
> ------------------------------
>
>
>
> Hi, I'm working on having "SparkPerf" (
> *https://github.com/databricks/spark-perf*
> <https://github.com/databricks/spark-perf>) run with Spark 2.0, noticed a
> few pull requests not yet accepted so concerned this project's been
> abandoned - it's proven very useful in the past for quality assurance as we
> can easily exercise lots of Spark functions with a cluster (perhaps
> exposing problems that don't surface with the Spark unit tests).
>
> I want to use Scala 2.11.8 and Spark 2.0.0 so I'm making my way through
> various files, currently faced with a NoSuchMethod exception
>
> NoSuchMethodError:
> org/apache/spark/SparkContext.rddToPairRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;Lscala/math/Ordering;)Lorg/apache/spark/rdd/PairRDDFunctions;
> at spark.perf.AggregateByKey.runTest(KVDataTest.scala:137)
>
> class AggregateByKey(sc: SparkContext) extends KVDataTest(sc) {
>  override def runTest(rdd: RDD[_], reduceTasks: Int) {
>    rdd.asInstanceOf[RDD[(String, String)]]
>     * .map{case (k, v) => (k, v.toInt)}.reduceByKey(_ + _,
> reduceTasks).count()*
>  }
> }
>
> Grepping shows
> *
> ./spark-tests/target/streams/compile/incCompileSetup/$global/streams/inc_compile_2.10:/home/aroberts/Desktop/spark-perf/spark-tests/src/main/scala/spark/perf/KVDataTest.scala
> -> rddToPairRDDFunctions *
>
> The scheduling-throughput tests complete fine but the problem here is seen
> with agg-by-key (and likely other modules to fix owing to API changes
> between 1.x and 2.x which I guess is the cause of the above problem).
>
> * Has anybody already made good progress here? Would like to work together
> and get this available for everyone, I'll be churning through it either
> way. *Will be looking at HiBench also.
>
> Next step for me is to use sbt -Dspark.version=2.0.0 (2.0.0-preview?) and
> work from there, although I figured the prep tests stage would do this for
> me (how else is it going to build?).
>
> Cheers,
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Re: Databricks SparkPerf with Spark 2.0

Posted by Adam Roberts <AR...@uk.ibm.com>.

Fixed the below problem, grepped for spark.version, noticed some instances 
of 1.5.2 being declared, changed to 2.0.0-preview in 
spark-tests/project/SparkTestsBuild.scala

Next one to fix is:
16/06/14 12:52:44 INFO ContextCleaner: Cleaned shuffle 9
Exception in thread "main" java.lang.NoSuchMethodError: 
org/json4s/jackson/JsonMethods$.render$default$2(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/Formats;

I'm going to log this and further progress under "Issues" for the project 
itself (probably need to change org.json4s version in 
SparkTestsBuild.scala, now I know this file is super important), so the 
emails here will at least point people there.

Cheers,







From:   Adam Roberts/UK/IBM@IBMGB
To:     dev <de...@spark.apache.org>
Date:   14/06/2016 12:18
Subject:        Databricks SparkPerf with Spark 2.0



Hi, I'm working on having "SparkPerf" (
https://github.com/databricks/spark-perf) run with Spark 2.0, noticed a 
few pull requests not yet accepted so concerned this project's been 
abandoned - it's proven very useful in the past for quality assurance as 
we can easily exercise lots of Spark functions with a cluster (perhaps 
exposing problems that don't surface with the Spark unit tests). 

I want to use Scala 2.11.8 and Spark 2.0.0 so I'm making my way through 
various files, currently faced with a NoSuchMethod exception 

NoSuchMethodError: 
org/apache/spark/SparkContext.rddToPairRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;Lscala/math/Ordering;)Lorg/apache/spark/rdd/PairRDDFunctions; 
at spark.perf.AggregateByKey.runTest(KVDataTest.scala:137) 

class AggregateByKey(sc: SparkContext) extends KVDataTest(sc) {
  override def runTest(rdd: RDD[_], reduceTasks: Int) {
    rdd.asInstanceOf[RDD[(String, String)]]
      .map{case (k, v) => (k, v.toInt)}.reduceByKey(_ + _, 
reduceTasks).count()
  } 
}

Grepping shows
./spark-tests/target/streams/compile/incCompileSetup/$global/streams/inc_compile_2.10:/home/aroberts/Desktop/spark-perf/spark-tests/src/main/scala/spark/perf/KVDataTest.scala 
-> rddToPairRDDFunctions 

The scheduling-throughput tests complete fine but the problem here is seen 
with agg-by-key (and likely other modules to fix owing to API changes 
between 1.x and 2.x which I guess is the cause of the above problem). 

Has anybody already made good progress here? Would like to work together 
and get this available for everyone, I'll be churning through it either 
way. Will be looking at HiBench also. 

Next step for me is to use sbt -Dspark.version=2.0.0 (2.0.0-preview?) and 
work from there, although I figured the prep tests stage would do this for 
me (how else is it going to build?). 

Cheers, 




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU