You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Ewan Higgs <ew...@ugent.be> on 2015/01/14 14:33:45 UTC

SparkSpark-perf terasort WIP branch

Hi all,
I'm trying to build the Spark-perf WIP code but there are some errors to 
do with Hadoop APIs. I presume this is because there is some Hadoop 
version set and it's referring to that. But I can't seem to find it.

The errors are as follows:

[info] Compiling 15 Scala sources and 2 Java sources to 
/home/ehiggs/src/spark-perf/spark-tests/target/scala-2.10/classes...
[error] 
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:40: 
object task is not a member of package org.apache.hadoop.mapreduce
[error] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
[error]                                    ^
[error] 
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:132: 
not found: type TaskAttemptContextImpl
[error]             val context = new TaskAttemptContextImpl(
[error]                               ^
[error] 
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:37: 
object TTConfig is not a member of package 
org.apache.hadoop.mapreduce.server.tasktracker
[error] import org.apache.hadoop.mapreduce.server.tasktracker.TTConfig
[error]        ^
[error] 
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:91: 
not found: value TTConfig
[error]   var slotsPerHost : Int = conf.getInt(TTConfig.TT_MAP_SLOTS, 4)
[error]                                        ^
[error] 
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:7: 
value run is not a member of org.apache.spark.examples.terasort.TeraGen
[error]     tg.run(Array[String]("10M", "/tmp/terasort_in"))
[error]        ^
[error] 
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:9: 
value run is not a member of org.apache.spark.examples.terasort.TeraSort
[error]     ts.run(Array[String]("/tmp/terasort_in", "/tmp/terasort_out"))
[error]        ^
[error] 6 errors found
[error] (compile:compile) Compilation failed
[error] Total time: 13 s, completed 05-Jan-2015 12:21:47

I can build the same code if it's in the Spark tree using the following 
command:
mvn -Dhadoop.version=2.5.0 -DskipTests=true install

Is there a way I can convince spark-perf to build this code with the 
appropriate Hadoop library version? I tried to apply the following to 
spark-tests/project/SparkTestsBuild.scala but it didn't seem to work as 
I expected:

$ git diff project/SparkTestsBuild.scala
diff --git a/spark-tests/project/SparkTestsBuild.scala 
b/spark-tests/project/SparkTestsBuild.scala
index 4116326..4ed5f0c 100644
--- a/spark-tests/project/SparkTestsBuild.scala
+++ b/spark-tests/project/SparkTestsBuild.scala
@@ -16,7 +16,9 @@ object SparkTestsBuild extends Build {
          "org.scalatest" %% "scalatest" % "2.2.1" % "test",
          "com.google.guava" % "guava" % "14.0.1",
          "org.apache.spark" %% "spark-core" % "1.0.0" % "provided",
-        "org.json4s" %% "json4s-native" % "3.2.9"
+        "org.json4s" %% "json4s-native" % "3.2.9",
+        "org.apache.hadoop" % "hadoop-common" % "2.5.0",
+        "org.apache.hadoop" % "hadoop-mapreduce" % "2.5.0"
        ),
        test in assembly := {},
        outputPath in assembly := 
file("target/spark-perf-tests-assembly.jar"),
@@ -36,4 +38,4 @@ object SparkTestsBuild extends Build {
          case _ => MergeStrategy.first
        }
      ))
-}
\ No newline at end of file
+}


Yours,
Ewan

Re: SparkSpark-perf terasort WIP branch

Posted by Reynold Xin <rx...@databricks.com>.

Hi Ewan,

Sorry it took a while for us to reply. I don't know spark-perf that well,
but I think this would be problematic if it works with only a specific
version of Hadoop. Maybe we can take a different approach -- just have a
bunch of tasks using the HDFS client API to read data, and not relying on
input formats?


On Fri, Mar 6, 2015 at 1:41 AM, Ewan Higgs <ew...@ugent.be> wrote:

> Hi all,
> I never heard from anyone on this and have received emails in private that
> people would like to add terasort to their spark-perf installs so it
> becomes part of their cluster validation checks.
>
> Yours,
> Ewan
>
>
> -------- Forwarded Message --------
> Subject:        SparkSpark-perf terasort WIP branch
> Date:   Wed, 14 Jan 2015 14:33:45 +0100
> From:   Ewan Higgs <ew...@ugent.be>
> To:     dev@spark.apache.org <de...@spark.apache.org>
>
>
>
> Hi all,
> I'm trying to build the Spark-perf WIP code but there are some errors to
> do with Hadoop APIs. I presume this is because there is some Hadoop
> version set and it's referring to that. But I can't seem to find it.
>
> The errors are as follows:
>
> [info] Compiling 15 Scala sources and 2 Java sources to
> /home/ehiggs/src/spark-perf/spark-tests/target/scala-2.10/classes...
> [error]
> /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/
> spark/perf/terasort/TeraInputFormat.scala:40:
> object task is not a member of package org.apache.hadoop.mapreduce
> [error] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> [error]                                    ^
> [error]
> /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/
> spark/perf/terasort/TeraInputFormat.scala:132:
> not found: type TaskAttemptContextImpl
> [error]             val context = new TaskAttemptContextImpl(
> [error]                               ^
> [error]
> /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/
> spark/perf/terasort/TeraScheduler.scala:37:
> object TTConfig is not a member of package
> org.apache.hadoop.mapreduce.server.tasktracker
> [error] import org.apache.hadoop.mapreduce.server.tasktracker.TTConfig
> [error]        ^
> [error]
> /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/
> spark/perf/terasort/TeraScheduler.scala:91:
> not found: value TTConfig
> [error]   var slotsPerHost : Int = conf.getInt(TTConfig.TT_MAP_SLOTS, 4)
> [error]                                        ^
> [error]
> /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/
> spark/perf/terasort/TeraSortAll.scala:7:
> value run is not a member of org.apache.spark.examples.terasort.TeraGen
> [error]     tg.run(Array[String]("10M", "/tmp/terasort_in"))
> [error]        ^
> [error]
> /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/
> spark/perf/terasort/TeraSortAll.scala:9:
> value run is not a member of org.apache.spark.examples.terasort.TeraSort
> [error]     ts.run(Array[String]("/tmp/terasort_in", "/tmp/terasort_out"))
> [error]        ^
> [error] 6 errors found
> [error] (compile:compile) Compilation failed
> [error] Total time: 13 s, completed 05-Jan-2015 12:21:47
>
> I can build the same code if it's in the Spark tree using the following
> command:
> mvn -Dhadoop.version=2.5.0 -DskipTests=true install
>
> Is there a way I can convince spark-perf to build this code with the
> appropriate Hadoop library version? I tried to apply the following to
> spark-tests/project/SparkTestsBuild.scala but it didn't seem to work as
> I expected:
>
> $ git diff project/SparkTestsBuild.scala
> diff --git a/spark-tests/project/SparkTestsBuild.scala
> b/spark-tests/project/SparkTestsBuild.scala
> index 4116326..4ed5f0c 100644
> --- a/spark-tests/project/SparkTestsBuild.scala
> +++ b/spark-tests/project/SparkTestsBuild.scala
> @@ -16,7 +16,9 @@ object SparkTestsBuild extends Build {
>           "org.scalatest" %% "scalatest" % "2.2.1" % "test",
>           "com.google.guava" % "guava" % "14.0.1",
>           "org.apache.spark" %% "spark-core" % "1.0.0" % "provided",
> -        "org.json4s" %% "json4s-native" % "3.2.9"
> +        "org.json4s" %% "json4s-native" % "3.2.9",
> +        "org.apache.hadoop" % "hadoop-common" % "2.5.0",
> +        "org.apache.hadoop" % "hadoop-mapreduce" % "2.5.0"
>         ),
>         test in assembly := {},
>         outputPath in assembly :=
> file("target/spark-perf-tests-assembly.jar"),
> @@ -36,4 +38,4 @@ object SparkTestsBuild extends Build {
>           case _ => MergeStrategy.first
>         }
>       ))
> -}
> \ No newline at end of file
> +}
>
>
> Yours,
> Ewan
>
>
>
>

Fwd: SparkSpark-perf terasort WIP branch

Posted by Ewan Higgs <ew...@ugent.be>.

Hi all,
I never heard from anyone on this and have received emails in private 
that people would like to add terasort to their spark-perf installs so 
it becomes part of their cluster validation checks.

Yours,
Ewan

-------- Forwarded Message --------
Subject: 	SparkSpark-perf terasort WIP branch
Date: 	Wed, 14 Jan 2015 14:33:45 +0100
From: 	Ewan Higgs <ew...@ugent.be>
To: 	dev@spark.apache.org <de...@spark.apache.org>



Hi all,
I'm trying to build the Spark-perf WIP code but there are some errors to
do with Hadoop APIs. I presume this is because there is some Hadoop
version set and it's referring to that. But I can't seem to find it.

The errors are as follows:

[info] Compiling 15 Scala sources and 2 Java sources to
/home/ehiggs/src/spark-perf/spark-tests/target/scala-2.10/classes...
[error]
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:40:
object task is not a member of package org.apache.hadoop.mapreduce
[error] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
[error]                                    ^
[error]
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:132:
not found: type TaskAttemptContextImpl
[error]             val context = new TaskAttemptContextImpl(
[error]                               ^
[error]
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:37:
object TTConfig is not a member of package
org.apache.hadoop.mapreduce.server.tasktracker
[error] import org.apache.hadoop.mapreduce.server.tasktracker.TTConfig
[error]        ^
[error]
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:91:
not found: value TTConfig
[error]   var slotsPerHost : Int = conf.getInt(TTConfig.TT_MAP_SLOTS, 4)
[error]                                        ^
[error]
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:7:
value run is not a member of org.apache.spark.examples.terasort.TeraGen
[error]     tg.run(Array[String]("10M", "/tmp/terasort_in"))
[error]        ^
[error]
/home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:9:
value run is not a member of org.apache.spark.examples.terasort.TeraSort
[error]     ts.run(Array[String]("/tmp/terasort_in", "/tmp/terasort_out"))
[error]        ^
[error] 6 errors found
[error] (compile:compile) Compilation failed
[error] Total time: 13 s, completed 05-Jan-2015 12:21:47

I can build the same code if it's in the Spark tree using the following
command:
mvn -Dhadoop.version=2.5.0 -DskipTests=true install

Is there a way I can convince spark-perf to build this code with the
appropriate Hadoop library version? I tried to apply the following to
spark-tests/project/SparkTestsBuild.scala but it didn't seem to work as
I expected:

$ git diff project/SparkTestsBuild.scala
diff --git a/spark-tests/project/SparkTestsBuild.scala
b/spark-tests/project/SparkTestsBuild.scala
index 4116326..4ed5f0c 100644
--- a/spark-tests/project/SparkTestsBuild.scala
+++ b/spark-tests/project/SparkTestsBuild.scala
@@ -16,7 +16,9 @@ object SparkTestsBuild extends Build {
           "org.scalatest" %% "scalatest" % "2.2.1" % "test",
           "com.google.guava" % "guava" % "14.0.1",
           "org.apache.spark" %% "spark-core" % "1.0.0" % "provided",
-        "org.json4s" %% "json4s-native" % "3.2.9"
+        "org.json4s" %% "json4s-native" % "3.2.9",
+        "org.apache.hadoop" % "hadoop-common" % "2.5.0",
+        "org.apache.hadoop" % "hadoop-mapreduce" % "2.5.0"
         ),
         test in assembly := {},
         outputPath in assembly :=
file("target/spark-perf-tests-assembly.jar"),
@@ -36,4 +38,4 @@ object SparkTestsBuild extends Build {
           case _ => MergeStrategy.first
         }
       ))
-}
\ No newline at end of file
+}


Yours,
Ewan