You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ping Liu <pi...@gmail.com> on 2019/12/05 21:27:25 UTC
Is it feasible to build and run Spark on Windows?
Hello,
I understand Spark is preferably built on Linux. But I have a Windows
machine with a slow Virtual Box for Linux. So I wish I am able to build
and run Spark code on Windows environment.
Unfortunately,
# Apache Hadoop 2.6.X
./build/mvn -Pyarn -DskipTests clean package
# Apache Hadoop 2.7.X and later
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
Both are listed on
http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
But neither works for me (I stay directly under spark root directory and
run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
package"
and
Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
clean package"
Now build works. But when I run spark-shell. I got the following error.
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Has anyone experienced building and running Spark source code successfully
on Windows? Could you please share your experience?
Thanks a lot!
Ping
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Multiple Guava versions could be in the classpath inherited from Hadoop. Use the Guava version supported by Spark, and exclude other Guava. Also add spark.executor.userClassPathFirst=true and spark.driver.userClassPathFirst=true in properties.
On Thursday, December 5, 2019, 11:35:27 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Sean,
Oh, sorry. I just came back to Spark home. However, the same error came out.
D:\apache\spark\bin>cd ..
D:\apache\spark>bin\spark-shell
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark>
The error shows com.google.common.base.Preconditions.checkArgument() requires two parameters: String and Object.
But Guava version 19 Preconditions (https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html) shows an additinal boolean variable as first parameter.
|
|
|
| static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) |
|
|
From Hadoop Configuration source code here (https://hadoop.apache.org/docs/r2.7.1/api/src-html/org/apache/hadoop/conf/Configuration.html),
1130 public void set(String name, String value, String source) {
1131 Preconditions.checkArgument(
1132 name != null,
1133 "Property name must not be null");
1134 Preconditions.checkArgument(
1135 value != null,
1136 "The value of property " + name + " must not be null");My best guess was that maybe an old version of Hadoop is used somewhere that might incorrectly call Preditions.checkArgument(String, Object) but not Preditions.checkArgument(boolean, String, Object). But this is just my guess.
Thanks.
Ping
|
| |
|
|
|
|
|
On Thu, Dec 5, 2019 at 2:38 PM Sean Owen <sr...@gmail.com> wrote:
No, the build works fine, at least certainly on test machines. As I
say, try running from the actual Spark home, not bin/. You are still
running spark-shell there.
On Thu, Dec 5, 2019 at 4:37 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>>
>> What was the build error? you didn't say. Are you sure it succeeded?
>> Try running from the Spark home dir, not bin.
>> I know we do run Windows tests and it appears to pass tests, etc.
>>
>> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>> >
>> > Unfortunately,
>> >
>> > # Apache Hadoop 2.6.X
>> > ./build/mvn -Pyarn -DskipTests clean package
>> >
>> > # Apache Hadoop 2.7.X and later
>> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>> >
>> >
>> > Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>> >
>> > But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>> >
>> > and
>> >
>> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>> >
>> > Now build works. But when I run spark-shell. I got the following error.
>> >
>> > D:\apache\spark\bin>spark-shell
>> > Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> > at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> > at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> > at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> > at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
>> > at scala.Option.getOrElse(Option.scala:189)
>> > at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> > at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> > at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> >
>> >
>> > Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>> >
>> > Thanks a lot!
>> >
>> > Ping
>> >
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Sean,
Oh, sorry. I just came back to Spark home. However, the same error came
out.
D:\apache\spark\bin>cd ..
D:\apache\spark>bin\spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark>
The error shows com.google.common.base.Preconditions.checkArgument()
requires two parameters: String and Object.
But Guava version 19 Preconditions (
https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html)
shows an additinal boolean variable as first parameter.
<https://guava.dev/releases/21.0/api/docs/com/google/common/base/Preconditions.html#checkArgument-boolean-java.lang.String-java.lang.Object->
static void *checkArgument
<https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,
java.lang.String, java.lang.Object...)>*(boolean expression, String
<http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true>
errorMessageTemplate,
Object
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html?is-external=true>
... errorMessageArgs)
From Hadoop Configuration source code here (
https://hadoop.apache.org/docs/r2.7.1/api/src-html/org/apache/hadoop/conf/Configuration.html),
1130 public void set(String name, String value, String source) {1131
Preconditions.checkArgument(1132 name != null,1133
"Property name must not be null");1134
Preconditions.checkArgument(1135 value != null,1136 "The
value of property " + name + " must not be null");
My best guess was that maybe an old version of Hadoop is used somewhere
that might incorrectly call Preditions.checkArgument(String, Object) but
not Preditions.checkArgument(boolean, String, Object). But this is just my
guess.
Thanks.
Ping
On Thu, Dec 5, 2019 at 2:38 PM Sean Owen <sr...@gmail.com> wrote:
> No, the build works fine, at least certainly on test machines. As I
> say, try running from the actual Spark home, not bin/. You are still
> running spark-shell there.
>
> On Thu, Dec 5, 2019 at 4:37 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hi Sean,
> >
> > Thanks for your response!
> >
> > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
> to Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
> >
> > Thanks Sean.
> >
> > Ping
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [08:03 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:51 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:20 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [03:16 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:45 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:12 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [07:50 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> > [INFO] Spark Project REPL ................................. SUCCESS
> [01:03 min]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:19 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:19 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:20 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:26 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:00 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:44 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 17:30 h
> > [INFO] Finished at: 2019-12-05T12:20:01-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>ls
> > beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> > beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> > docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> > find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> > find-spark-home.cmd pyspark2.cmd spark-class2.cmd
> spark-sql.cmd sparkR
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > D:\apache\spark\bin>
> >
> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> >>
> >> What was the build error? you didn't say. Are you sure it succeeded?
> >> Try running from the Spark home dir, not bin.
> >> I know we do run Windows tests and it appears to pass tests, etc.
> >>
> >> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I understand Spark is preferably built on Linux. But I have a
> Windows machine with a slow Virtual Box for Linux. So I wish I am able to
> build and run Spark code on Windows environment.
> >> >
> >> > Unfortunately,
> >> >
> >> > # Apache Hadoop 2.6.X
> >> > ./build/mvn -Pyarn -DskipTests clean package
> >> >
> >> > # Apache Hadoop 2.7.X and later
> >> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests
> clean package
> >> >
> >> >
> >> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >> >
> >> > But neither works for me (I stay directly under spark root directory
> and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >> >
> >> > and
> >> >
> >> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1
> -DskipTests clean package"
> >> >
> >> > Now build works. But when I run spark-shell. I got the following
> error.
> >> >
> >> > D:\apache\spark\bin>spark-shell
> >> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> >> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> >> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> >> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> >> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> >> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> >> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> >> > at scala.Option.getOrElse(Option.scala:189)
> >> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> >> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> >> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> >> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> >> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >> >
> >> >
> >> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >> >
> >> > Thanks a lot!
> >> >
> >> > Ping
> >> >
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Sean,
Oh, sorry. I just came back to Spark home. However, the same error came
out.
D:\apache\spark\bin>cd ..
D:\apache\spark>bin\spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark>
The error shows com.google.common.base.Preconditions.checkArgument()
requires two parameters: String and Object.
But Guava version 19 Preconditions (
https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html)
shows an additinal boolean variable as first parameter.
<https://guava.dev/releases/21.0/api/docs/com/google/common/base/Preconditions.html#checkArgument-boolean-java.lang.String-java.lang.Object->
static void *checkArgument
<https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,
java.lang.String, java.lang.Object...)>*(boolean expression, String
<http://docs.oracle.com/javase/7/docs/api/java/lang/String.html?is-external=true>
errorMessageTemplate,
Object
<http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html?is-external=true>
... errorMessageArgs)
From Hadoop Configuration source code here (
https://hadoop.apache.org/docs/r2.7.1/api/src-html/org/apache/hadoop/conf/Configuration.html),
1130 public void set(String name, String value, String source) {1131
Preconditions.checkArgument(1132 name != null,1133
"Property name must not be null");1134
Preconditions.checkArgument(1135 value != null,1136 "The
value of property " + name + " must not be null");
My best guess was that maybe an old version of Hadoop is used somewhere
that might incorrectly call Preditions.checkArgument(String, Object) but
not Preditions.checkArgument(boolean, String, Object). But this is just my
guess.
Thanks.
Ping
On Thu, Dec 5, 2019 at 2:38 PM Sean Owen <sr...@gmail.com> wrote:
> No, the build works fine, at least certainly on test machines. As I
> say, try running from the actual Spark home, not bin/. You are still
> running spark-shell there.
>
> On Thu, Dec 5, 2019 at 4:37 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hi Sean,
> >
> > Thanks for your response!
> >
> > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
> to Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
> >
> > Thanks Sean.
> >
> > Ping
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [08:03 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:51 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:20 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [03:16 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:45 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:12 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [07:50 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> > [INFO] Spark Project REPL ................................. SUCCESS
> [01:03 min]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:19 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:19 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:20 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:26 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:00 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:44 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 17:30 h
> > [INFO] Finished at: 2019-12-05T12:20:01-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>ls
> > beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> > beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> > docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> > find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> > find-spark-home.cmd pyspark2.cmd spark-class2.cmd
> spark-sql.cmd sparkR
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > D:\apache\spark\bin>
> >
> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> >>
> >> What was the build error? you didn't say. Are you sure it succeeded?
> >> Try running from the Spark home dir, not bin.
> >> I know we do run Windows tests and it appears to pass tests, etc.
> >>
> >> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I understand Spark is preferably built on Linux. But I have a
> Windows machine with a slow Virtual Box for Linux. So I wish I am able to
> build and run Spark code on Windows environment.
> >> >
> >> > Unfortunately,
> >> >
> >> > # Apache Hadoop 2.6.X
> >> > ./build/mvn -Pyarn -DskipTests clean package
> >> >
> >> > # Apache Hadoop 2.7.X and later
> >> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests
> clean package
> >> >
> >> >
> >> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >> >
> >> > But neither works for me (I stay directly under spark root directory
> and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >> >
> >> > and
> >> >
> >> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1
> -DskipTests clean package"
> >> >
> >> > Now build works. But when I run spark-shell. I got the following
> error.
> >> >
> >> > D:\apache\spark\bin>spark-shell
> >> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> >> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> >> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> >> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> >> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> >> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> >> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> >> > at scala.Option.getOrElse(Option.scala:189)
> >> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> >> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> >> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> >> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> >> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >> >
> >> >
> >> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >> >
> >> > Thanks a lot!
> >> >
> >> > Ping
> >> >
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Sean Owen <sr...@gmail.com>.
No, the build works fine, at least certainly on test machines. As I
say, try running from the actual Spark home, not bin/. You are still
running spark-shell there.
On Thu, Dec 5, 2019 at 4:37 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>>
>> What was the build error? you didn't say. Are you sure it succeeded?
>> Try running from the Spark home dir, not bin.
>> I know we do run Windows tests and it appears to pass tests, etc.
>>
>> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>> >
>> > Unfortunately,
>> >
>> > # Apache Hadoop 2.6.X
>> > ./build/mvn -Pyarn -DskipTests clean package
>> >
>> > # Apache Hadoop 2.7.X and later
>> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>> >
>> >
>> > Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>> >
>> > But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>> >
>> > and
>> >
>> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>> >
>> > Now build works. But when I run spark-shell. I got the following error.
>> >
>> > D:\apache\spark\bin>spark-shell
>> > Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> > at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> > at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> > at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> > at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
>> > at scala.Option.getOrElse(Option.scala:189)
>> > at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> > at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> > at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> >
>> >
>> > Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>> >
>> > Thanks a lot!
>> >
>> > Ping
>> >
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Sean Owen <sr...@gmail.com>.
No, the build works fine, at least certainly on test machines. As I
say, try running from the actual Spark home, not bin/. You are still
running spark-shell there.
On Thu, Dec 5, 2019 at 4:37 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>>
>> What was the build error? you didn't say. Are you sure it succeeded?
>> Try running from the Spark home dir, not bin.
>> I know we do run Windows tests and it appears to pass tests, etc.
>>
>> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>> >
>> > Unfortunately,
>> >
>> > # Apache Hadoop 2.6.X
>> > ./build/mvn -Pyarn -DskipTests clean package
>> >
>> > # Apache Hadoop 2.7.X and later
>> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>> >
>> >
>> > Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>> >
>> > But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>> >
>> > and
>> >
>> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>> >
>> > Now build works. But when I run spark-shell. I got the following error.
>> >
>> > D:\apache\spark\bin>spark-shell
>> > Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> > at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> > at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> > at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> > at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> > at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
>> > at scala.Option.getOrElse(Option.scala:189)
>> > at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> > at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> > at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> >
>> >
>> > Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>> >
>> > Thanks a lot!
>> >
>> > Ping
>> >
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
I am new and plan to be an individual contributor for bug fix. I assume I
need building the project if I'll be working on source code based on master
branch that the binaries are behind. Do you think this makes sense?
Please let me know if, in this case, I can still use binary instead of
building the project.
On Tue, Dec 10, 2019 at 7:00 AM Deepak Vohra <dv...@yahoo.com> wrote:
> The initial question was to build from source. Any reason to build when
> binaries are available at https://spark.apache.org/downloads.html
>
> On Tuesday, December 10, 2019, 03:05:44 AM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Super. Thanks Deepak!
>
> On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Please install Apache Spark on Windows as discussed in Apache Spark on
> Windows - DZone Open Source
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
> Apache Spark on Windows - DZone Open Source
>
> This article explains and provides solutions for some of the most common
> errors developers come across when inst...
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
>
>
> On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran
> out of free period. Is there a shared EC2 for Spark that we can use for
> free?
>
> Ping
>
>
> On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> > Haven't tested but the general procedure is to exclude all guava
> dependencies that are not needed. The hadoop-common depedency does not have
> a dependency on guava according to Maven Repository: org.apache.hadoop »
> hadoop-common
> >
> > Maven Repository: org.apache.hadoop » hadoop-common
> >
> > Apache Spark 2.4 has dependency on guava 14.
> > If a Docker image for Cloudera Hadoop is used Spark is may be installed
> on Docker for Windows.
> > For Docker on Windows on EC2 refer Getting Started with Docker for
> Windows - Developer.com
> >
> > Getting Started with Docker for Windows - Developer.com
> >
> > Docker for Windows makes it feasible to run a Docker daemon on Windows
> Server 2016. Learn to harness its power.
> >
> >
> > Conflicting versions is not an issue if Docker is used.
> > "Apache Spark applications usually have a complex set of required
> software dependencies. Spark applications may require specific versions of
> these dependencies (such as Pyspark and R) on the Spark executor hosts,
> sometimes with conflicting versions."
> > Running Spark in Docker Containers on YARN
> >
> > Running Spark in Docker Containers on YARN
> >
> >
> >
> >
> >
> > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
> downloaded for somehow. I'll try something else. Thank you very much for
> your help!
> > Ping
> >
> > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
> > <dependency>
> > <groupId>org.apache.hadoop</groupId>
> > <artifactId>hadoop-common</artifactId>
> > <version>3.2.1</version>
> > <exclusions>
> > <exclusion>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > </exclusion>
> > </exclusions>
> > </dependency>
> > <dependency>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > <version>28.1-jre</version>
> > </dependency>
> > </dependencies>
> > </dependencyManagement>
> >
> > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > Following your suggestion, I put exclusion of guava in topmost POM
> (under Spark home directly) as follows.
> > 2227- </dependency>
> > 2228- <dependency>
> > 2229- <groupId>org.apache.hadoop</groupId>
> > 2230: <artifactId>hadoop-common</artifactId>
> > 2231- <version>3.2.1</version>
> > 2232- <exclusions>
> > 2233- <exclusion>
> > 2234- <groupId>com.google.guava</groupId>
> > 2235- <artifactId>guava</artifactId>
> > 2236- </exclusion>
> > 2237- </exclusions>
> > 2238- </dependency>
> > 2239- </dependencies>
> > 2240- </dependencyManagement>
> > I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
> > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> > and rebuilt spark.
> > But I got the same error when running spark-shell.
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 25.092 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 22.093 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 19.546 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 10.468 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 17.733 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.531 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 25.327 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 27.264 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [07:59 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:39 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:08 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [02:56 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:55 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:33 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS
> [08:49 min]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 16.967 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [06:15 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 10.219 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 11.952 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 11.171 s]
> > [INFO] Spark Project REPL ................................. SUCCESS [
> 55.029 s]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:07 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:22 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 21.483 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 56.450 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:21 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:33 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:05 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 30.780 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:43 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 01:08 h
> > [INFO] Finished at: 2019-12-06T11:43:08-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>spark-shell
> > 'spark-shell' is not recognized as an internal or external command,
> > operable program or batch file.
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > Before building spark, I went to my local Maven repo and removed guava
> at all. But after building, I found the same versions of guava have been
> downloaded.
> > D:\mavenrepo\com\google\guava\guava>ls
> > 14.0.1 16.0.1 18.0 19.0
> > On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
> >
> > On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Sean,
> > Thanks for your response!
> > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
> to Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
> >
> > Thanks Sean.
> > Ping
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [08:03 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:51 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:20 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [03:16 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:45 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:12 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [07:50 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> > [INFO] Spark Project REPL ................................. SUCCESS
> [01:03 min]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:19 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:19 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:20 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:26 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:00 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:44 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 17:30 h
> > [INFO] Finished at: 2019-12-05T12:20:01-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>ls
> > beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> > beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> > docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> > find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> > find-spark-home.cmd pyspark2.cmd spark-class2.cmd
> spark-sql.cmd sparkR
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > D:\apache\spark\bin>
> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> >
> > What was the build error? you didn't say. Are you sure it succeeded?
> > Try running from the Spark home dir, not bin.
> > I know we do run Windows tests and it appears to pass tests, etc.
> >
> > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >>
> >> Unfortunately,
> >>
> >> # Apache Hadoop 2.6.X
> >> ./build/mvn -Pyarn -DskipTests clean package
> >>
> >> # Apache Hadoop 2.7.X and later
> >> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests
> clean package
> >>
> >>
> >> Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >>
> >> But neither works for me (I stay directly under spark root directory
> and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >>
> >> and
> >>
> >> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1
> -DskipTests clean package"
> >>
> >> Now build works. But when I run spark-shell. I got the following
> error.
> >>
> >> D:\apache\spark\bin>spark-shell
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> >> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> >> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> >> at scala.Option.getOrElse(Option.scala:189)
> >> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> >> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> >> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> >> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >>
> >>
> >> Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >>
> >> Thanks a lot!
> >>
> >> Ping
> >>
> >
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
I am new and plan to be an individual contributor for bug fix. I assume I
need building the project if I'll be working on source code based on master
branch that the binaries are behind. Do you think this makes sense?
Please let me know if, in this case, I can still use binary instead of
building the project.
On Tue, Dec 10, 2019 at 7:00 AM Deepak Vohra <dv...@yahoo.com> wrote:
> The initial question was to build from source. Any reason to build when
> binaries are available at https://spark.apache.org/downloads.html
>
> On Tuesday, December 10, 2019, 03:05:44 AM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Super. Thanks Deepak!
>
> On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Please install Apache Spark on Windows as discussed in Apache Spark on
> Windows - DZone Open Source
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
> Apache Spark on Windows - DZone Open Source
>
> This article explains and provides solutions for some of the most common
> errors developers come across when inst...
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
>
>
> On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran
> out of free period. Is there a shared EC2 for Spark that we can use for
> free?
>
> Ping
>
>
> On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> > Haven't tested but the general procedure is to exclude all guava
> dependencies that are not needed. The hadoop-common depedency does not have
> a dependency on guava according to Maven Repository: org.apache.hadoop »
> hadoop-common
> >
> > Maven Repository: org.apache.hadoop » hadoop-common
> >
> > Apache Spark 2.4 has dependency on guava 14.
> > If a Docker image for Cloudera Hadoop is used Spark is may be installed
> on Docker for Windows.
> > For Docker on Windows on EC2 refer Getting Started with Docker for
> Windows - Developer.com
> >
> > Getting Started with Docker for Windows - Developer.com
> >
> > Docker for Windows makes it feasible to run a Docker daemon on Windows
> Server 2016. Learn to harness its power.
> >
> >
> > Conflicting versions is not an issue if Docker is used.
> > "Apache Spark applications usually have a complex set of required
> software dependencies. Spark applications may require specific versions of
> these dependencies (such as Pyspark and R) on the Spark executor hosts,
> sometimes with conflicting versions."
> > Running Spark in Docker Containers on YARN
> >
> > Running Spark in Docker Containers on YARN
> >
> >
> >
> >
> >
> > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
> downloaded for somehow. I'll try something else. Thank you very much for
> your help!
> > Ping
> >
> > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
> > <dependency>
> > <groupId>org.apache.hadoop</groupId>
> > <artifactId>hadoop-common</artifactId>
> > <version>3.2.1</version>
> > <exclusions>
> > <exclusion>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > </exclusion>
> > </exclusions>
> > </dependency>
> > <dependency>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > <version>28.1-jre</version>
> > </dependency>
> > </dependencies>
> > </dependencyManagement>
> >
> > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > Following your suggestion, I put exclusion of guava in topmost POM
> (under Spark home directly) as follows.
> > 2227- </dependency>
> > 2228- <dependency>
> > 2229- <groupId>org.apache.hadoop</groupId>
> > 2230: <artifactId>hadoop-common</artifactId>
> > 2231- <version>3.2.1</version>
> > 2232- <exclusions>
> > 2233- <exclusion>
> > 2234- <groupId>com.google.guava</groupId>
> > 2235- <artifactId>guava</artifactId>
> > 2236- </exclusion>
> > 2237- </exclusions>
> > 2238- </dependency>
> > 2239- </dependencies>
> > 2240- </dependencyManagement>
> > I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
> > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> > and rebuilt spark.
> > But I got the same error when running spark-shell.
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 25.092 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 22.093 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 19.546 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 10.468 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 17.733 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.531 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 25.327 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 27.264 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [07:59 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:39 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:08 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [02:56 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:55 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:33 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS
> [08:49 min]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 16.967 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [06:15 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 10.219 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 11.952 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 11.171 s]
> > [INFO] Spark Project REPL ................................. SUCCESS [
> 55.029 s]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:07 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:22 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 21.483 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 56.450 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:21 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:33 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:05 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 30.780 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:43 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 01:08 h
> > [INFO] Finished at: 2019-12-06T11:43:08-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>spark-shell
> > 'spark-shell' is not recognized as an internal or external command,
> > operable program or batch file.
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > Before building spark, I went to my local Maven repo and removed guava
> at all. But after building, I found the same versions of guava have been
> downloaded.
> > D:\mavenrepo\com\google\guava\guava>ls
> > 14.0.1 16.0.1 18.0 19.0
> > On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
> >
> > On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Sean,
> > Thanks for your response!
> > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
> to Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
> >
> > Thanks Sean.
> > Ping
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [08:03 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:51 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:20 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [03:16 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:45 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:12 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [07:50 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> > [INFO] Spark Project REPL ................................. SUCCESS
> [01:03 min]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:19 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:19 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:20 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:26 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:00 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:44 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 17:30 h
> > [INFO] Finished at: 2019-12-05T12:20:01-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>ls
> > beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> > beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> > docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> > find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> > find-spark-home.cmd pyspark2.cmd spark-class2.cmd
> spark-sql.cmd sparkR
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > D:\apache\spark\bin>
> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> >
> > What was the build error? you didn't say. Are you sure it succeeded?
> > Try running from the Spark home dir, not bin.
> > I know we do run Windows tests and it appears to pass tests, etc.
> >
> > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >>
> >> Unfortunately,
> >>
> >> # Apache Hadoop 2.6.X
> >> ./build/mvn -Pyarn -DskipTests clean package
> >>
> >> # Apache Hadoop 2.7.X and later
> >> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests
> clean package
> >>
> >>
> >> Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >>
> >> But neither works for me (I stay directly under spark root directory
> and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >>
> >> and
> >>
> >> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1
> -DskipTests clean package"
> >>
> >> Now build works. But when I run spark-shell. I got the following
> error.
> >>
> >> D:\apache\spark\bin>spark-shell
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> >> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> >> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> >> at scala.Option.getOrElse(Option.scala:189)
> >> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> >> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> >> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> >> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >>
> >>
> >> Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >>
> >> Thanks a lot!
> >>
> >> Ping
> >>
> >
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
The initial question was to build from source. Any reason to build when binaries are available at https://spark.apache.org/downloads.html
On Tuesday, December 10, 2019, 03:05:44 AM UTC, Ping Liu <pi...@gmail.com> wrote:
Super. Thanks Deepak!
On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra <dv...@yahoo.com> wrote:
Please install Apache Spark on Windows as discussed in Apache Spark on Windows - DZone Open Source
|
|
|
| | |
|
|
|
| |
Apache Spark on Windows - DZone Open Source
This article explains and provides solutions for some of the most common errors developers come across when inst...
|
|
|
On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran out of free period. Is there a shared EC2 for Spark that we can use for free?
Ping
On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> Haven't tested but the general procedure is to exclude all guava dependencies that are not needed. The hadoop-common depedency does not have a dependency on guava according to Maven Repository: org.apache.hadoop » hadoop-common
>
> Maven Repository: org.apache.hadoop » hadoop-common
>
> Apache Spark 2.4 has dependency on guava 14.
> If a Docker image for Cloudera Hadoop is used Spark is may be installed on Docker for Windows.
> For Docker on Windows on EC2 refer Getting Started with Docker for Windows - Developer.com
>
> Getting Started with Docker for Windows - Developer.com
>
> Docker for Windows makes it feasible to run a Docker daemon on Windows Server 2016. Learn to harness its power.
>
>
> Conflicting versions is not an issue if Docker is used.
> "Apache Spark applications usually have a complex set of required software dependencies. Spark applications may require specific versions of these dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes with conflicting versions."
> Running Spark in Docker Containers on YARN
>
> Running Spark in Docker Containers on YARN
>
>
>
>
>
> On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
>
> Hi Deepak,
> I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't downloaded for somehow. I'll try something else. Thank you very much for your help!
> Ping
>
> On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> As multiple guava versions are found exclude guava from all the dependecies it could have been downloaded with. And explicitly add a recent guava version.
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>hadoop-common</artifactId>
> <version>3.2.1</version>
> <exclusions>
> <exclusion>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
> <dependency>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> <version>28.1-jre</version>
> </dependency>
> </dependencies>
> </dependencyManagement>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
>
> Hi Deepak,
> Following your suggestion, I put exclusion of guava in topmost POM (under Spark home directly) as follows.
> 2227- </dependency>
> 2228- <dependency>
> 2229- <groupId>org.apache.hadoop</groupId>
> 2230: <artifactId>hadoop-common</artifactId>
> 2231- <version>3.2.1</version>
> 2232- <exclusions>
> 2233- <exclusion>
> 2234- <groupId>com.google.guava</groupId>
> 2235- <artifactId>guava</artifactId>
> 2236- </exclusion>
> 2237- </exclusions>
> 2238- </dependency>
> 2239- </dependencies>
> 2240- </dependencyManagement>
> I also set properties for spark.executor.userClassPathFirst=true and spark.driver.userClassPathFirst=true
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 -Dspark.executor.userClassPathFirst=true -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> and rebuilt spark.
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 25.092 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 22.093 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 19.546 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 10.468 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 17.733 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.531 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 25.327 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 27.264 s]
> [INFO] Spark Project Core ................................. SUCCESS [07:59 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:39 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:08 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [02:56 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:55 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:33 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [08:49 min]
> [INFO] Spark Project Tools ................................ SUCCESS [ 16.967 s]
> [INFO] Spark Project Hive ................................. SUCCESS [06:15 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 10.219 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 11.952 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 11.171 s]
> [INFO] Spark Project REPL ................................. SUCCESS [ 55.029 s]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:22 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:05 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 30.780 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:43 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Before building spark, I went to my local Maven repo and removed guava at all. But after building, I found the same versions of guava have been downloaded.
> D:\mavenrepo\com\google\guava\guava>ls
> 14.0.1 16.0.1 18.0 19.0
> On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Just to clarify, excluding Hadoop provided guava in pom.xml is an alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
>
> Hi Sean,
> Thanks for your response!
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
>
> Thanks Sean.
> Ping
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>>
>> Hello,
>>
>> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>>
>> Unfortunately,
>>
>> # Apache Hadoop 2.6.X
>> ./build/mvn -Pyarn -DskipTests clean package
>>
>> # Apache Hadoop 2.7.X and later
>> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>>
>>
>> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>>
>> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>>
>> and
>>
>> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>>
>> Now build works. But when I run spark-shell. I got the following error.
>>
>> D:\apache\spark\bin>spark-shell
>> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
>> at scala.Option.getOrElse(Option.scala:189)
>> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>>
>> Thanks a lot!
>>
>> Ping
>>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Super. Thanks Deepak!
On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Please install Apache Spark on Windows as discussed in Apache Spark on
> Windows - DZone Open Source
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
> Apache Spark on Windows - DZone Open Source
>
> This article explains and provides solutions for some of the most common
> errors developers come across when inst...
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
>
>
> On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran
> out of free period. Is there a shared EC2 for Spark that we can use for
> free?
>
> Ping
>
>
> On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> > Haven't tested but the general procedure is to exclude all guava
> dependencies that are not needed. The hadoop-common depedency does not have
> a dependency on guava according to Maven Repository: org.apache.hadoop »
> hadoop-common
> >
> > Maven Repository: org.apache.hadoop » hadoop-common
> >
> > Apache Spark 2.4 has dependency on guava 14.
> > If a Docker image for Cloudera Hadoop is used Spark is may be installed
> on Docker for Windows.
> > For Docker on Windows on EC2 refer Getting Started with Docker for
> Windows - Developer.com
> >
> > Getting Started with Docker for Windows - Developer.com
> >
> > Docker for Windows makes it feasible to run a Docker daemon on Windows
> Server 2016. Learn to harness its power.
> >
> >
> > Conflicting versions is not an issue if Docker is used.
> > "Apache Spark applications usually have a complex set of required
> software dependencies. Spark applications may require specific versions of
> these dependencies (such as Pyspark and R) on the Spark executor hosts,
> sometimes with conflicting versions."
> > Running Spark in Docker Containers on YARN
> >
> > Running Spark in Docker Containers on YARN
> >
> >
> >
> >
> >
> > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
> downloaded for somehow. I'll try something else. Thank you very much for
> your help!
> > Ping
> >
> > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
> > <dependency>
> > <groupId>org.apache.hadoop</groupId>
> > <artifactId>hadoop-common</artifactId>
> > <version>3.2.1</version>
> > <exclusions>
> > <exclusion>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > </exclusion>
> > </exclusions>
> > </dependency>
> > <dependency>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > <version>28.1-jre</version>
> > </dependency>
> > </dependencies>
> > </dependencyManagement>
> >
> > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > Following your suggestion, I put exclusion of guava in topmost POM
> (under Spark home directly) as follows.
> > 2227- </dependency>
> > 2228- <dependency>
> > 2229- <groupId>org.apache.hadoop</groupId>
> > 2230: <artifactId>hadoop-common</artifactId>
> > 2231- <version>3.2.1</version>
> > 2232- <exclusions>
> > 2233- <exclusion>
> > 2234- <groupId>com.google.guava</groupId>
> > 2235- <artifactId>guava</artifactId>
> > 2236- </exclusion>
> > 2237- </exclusions>
> > 2238- </dependency>
> > 2239- </dependencies>
> > 2240- </dependencyManagement>
> > I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
> > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> > and rebuilt spark.
> > But I got the same error when running spark-shell.
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 25.092 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 22.093 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 19.546 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 10.468 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 17.733 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.531 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 25.327 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 27.264 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [07:59 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:39 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:08 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [02:56 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:55 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:33 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS
> [08:49 min]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 16.967 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [06:15 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 10.219 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 11.952 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 11.171 s]
> > [INFO] Spark Project REPL ................................. SUCCESS [
> 55.029 s]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:07 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:22 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 21.483 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 56.450 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:21 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:33 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:05 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 30.780 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:43 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 01:08 h
> > [INFO] Finished at: 2019-12-06T11:43:08-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>spark-shell
> > 'spark-shell' is not recognized as an internal or external command,
> > operable program or batch file.
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > Before building spark, I went to my local Maven repo and removed guava
> at all. But after building, I found the same versions of guava have been
> downloaded.
> > D:\mavenrepo\com\google\guava\guava>ls
> > 14.0.1 16.0.1 18.0 19.0
> > On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
> >
> > On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Sean,
> > Thanks for your response!
> > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
> to Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
> >
> > Thanks Sean.
> > Ping
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [08:03 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:51 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:20 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [03:16 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:45 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:12 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [07:50 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> > [INFO] Spark Project REPL ................................. SUCCESS
> [01:03 min]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:19 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:19 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:20 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:26 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:00 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:44 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 17:30 h
> > [INFO] Finished at: 2019-12-05T12:20:01-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>ls
> > beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> > beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> > docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> > find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> > find-spark-home.cmd pyspark2.cmd spark-class2.cmd
> spark-sql.cmd sparkR
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > D:\apache\spark\bin>
> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> >
> > What was the build error? you didn't say. Are you sure it succeeded?
> > Try running from the Spark home dir, not bin.
> > I know we do run Windows tests and it appears to pass tests, etc.
> >
> > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >>
> >> Unfortunately,
> >>
> >> # Apache Hadoop 2.6.X
> >> ./build/mvn -Pyarn -DskipTests clean package
> >>
> >> # Apache Hadoop 2.7.X and later
> >> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests
> clean package
> >>
> >>
> >> Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >>
> >> But neither works for me (I stay directly under spark root directory
> and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >>
> >> and
> >>
> >> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1
> -DskipTests clean package"
> >>
> >> Now build works. But when I run spark-shell. I got the following
> error.
> >>
> >> D:\apache\spark\bin>spark-shell
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> >> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> >> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> >> at scala.Option.getOrElse(Option.scala:189)
> >> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> >> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> >> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> >> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >>
> >>
> >> Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >>
> >> Thanks a lot!
> >>
> >> Ping
> >>
> >
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Super. Thanks Deepak!
On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Please install Apache Spark on Windows as discussed in Apache Spark on
> Windows - DZone Open Source
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
> Apache Spark on Windows - DZone Open Source
>
> This article explains and provides solutions for some of the most common
> errors developers come across when inst...
> <https://dzone.com/articles/working-on-apache-spark-on-windows>
>
>
>
> On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran
> out of free period. Is there a shared EC2 for Spark that we can use for
> free?
>
> Ping
>
>
> On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> > Haven't tested but the general procedure is to exclude all guava
> dependencies that are not needed. The hadoop-common depedency does not have
> a dependency on guava according to Maven Repository: org.apache.hadoop »
> hadoop-common
> >
> > Maven Repository: org.apache.hadoop » hadoop-common
> >
> > Apache Spark 2.4 has dependency on guava 14.
> > If a Docker image for Cloudera Hadoop is used Spark is may be installed
> on Docker for Windows.
> > For Docker on Windows on EC2 refer Getting Started with Docker for
> Windows - Developer.com
> >
> > Getting Started with Docker for Windows - Developer.com
> >
> > Docker for Windows makes it feasible to run a Docker daemon on Windows
> Server 2016. Learn to harness its power.
> >
> >
> > Conflicting versions is not an issue if Docker is used.
> > "Apache Spark applications usually have a complex set of required
> software dependencies. Spark applications may require specific versions of
> these dependencies (such as Pyspark and R) on the Spark executor hosts,
> sometimes with conflicting versions."
> > Running Spark in Docker Containers on YARN
> >
> > Running Spark in Docker Containers on YARN
> >
> >
> >
> >
> >
> > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
> downloaded for somehow. I'll try something else. Thank you very much for
> your help!
> > Ping
> >
> > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
> > <dependency>
> > <groupId>org.apache.hadoop</groupId>
> > <artifactId>hadoop-common</artifactId>
> > <version>3.2.1</version>
> > <exclusions>
> > <exclusion>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > </exclusion>
> > </exclusions>
> > </dependency>
> > <dependency>
> > <groupId>com.google.guava</groupId>
> > <artifactId>guava</artifactId>
> > <version>28.1-jre</version>
> > </dependency>
> > </dependencies>
> > </dependencyManagement>
> >
> > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Deepak,
> > Following your suggestion, I put exclusion of guava in topmost POM
> (under Spark home directly) as follows.
> > 2227- </dependency>
> > 2228- <dependency>
> > 2229- <groupId>org.apache.hadoop</groupId>
> > 2230: <artifactId>hadoop-common</artifactId>
> > 2231- <version>3.2.1</version>
> > 2232- <exclusions>
> > 2233- <exclusion>
> > 2234- <groupId>com.google.guava</groupId>
> > 2235- <artifactId>guava</artifactId>
> > 2236- </exclusion>
> > 2237- </exclusions>
> > 2238- </dependency>
> > 2239- </dependencies>
> > 2240- </dependencyManagement>
> > I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
> > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> > and rebuilt spark.
> > But I got the same error when running spark-shell.
> >
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 25.092 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 22.093 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 19.546 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 10.468 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 17.733 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.531 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 25.327 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 27.264 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [07:59 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:39 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:08 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [02:56 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:55 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:33 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS
> [08:49 min]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 16.967 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [06:15 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 10.219 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 11.952 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 11.171 s]
> > [INFO] Spark Project REPL ................................. SUCCESS [
> 55.029 s]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:07 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:22 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 21.483 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 56.450 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:21 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:33 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:05 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 30.780 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:43 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 01:08 h
> > [INFO] Finished at: 2019-12-06T11:43:08-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>spark-shell
> > 'spark-shell' is not recognized as an internal or external command,
> > operable program or batch file.
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > Before building spark, I went to my local Maven repo and removed guava
> at all. But after building, I found the same versions of guava have been
> downloaded.
> > D:\mavenrepo\com\google\guava\guava>ls
> > 14.0.1 16.0.1 18.0 19.0
> > On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
> >
> > Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
> >
> > On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
> >
> > Hi Sean,
> > Thanks for your response!
> > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
> to Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
> >
> > Thanks Sean.
> > Ping
> > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> > [INFO]
> > [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> > [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> > [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> > [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> > [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> > [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> > [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> > [INFO] Spark Project Core ................................. SUCCESS
> [08:03 min]
> > [INFO] Spark Project ML Local Library ..................... SUCCESS
> [01:51 min]
> > [INFO] Spark Project GraphX ............................... SUCCESS
> [02:20 min]
> > [INFO] Spark Project Streaming ............................ SUCCESS
> [03:16 min]
> > [INFO] Spark Project Catalyst ............................. SUCCESS
> [08:45 min]
> > [INFO] Spark Project SQL .................................. SUCCESS
> [12:12 min]
> > [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> > [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> > [INFO] Spark Project Hive ................................. SUCCESS
> [07:50 min]
> > [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> > [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> > [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> > [INFO] Spark Project REPL ................................. SUCCESS
> [01:03 min]
> > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
> [01:19 min]
> > [INFO] Spark Project YARN ................................. SUCCESS
> [02:19 min]
> > [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
> [01:20 min]
> > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
> [02:26 min]
> > [INFO] Spark Project Examples ............................. SUCCESS
> [02:00 min]
> > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> > [INFO] Spark Avro ......................................... SUCCESS
> [01:44 min]
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD SUCCESS
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 17:30 h
> > [INFO] Finished at: 2019-12-05T12:20:01-08:00
> > [INFO]
> ------------------------------------------------------------------------
> >
> > D:\apache\spark>cd bin
> >
> > D:\apache\spark\bin>ls
> > beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> > beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> > docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> > find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> > find-spark-home.cmd pyspark2.cmd spark-class2.cmd
> spark-sql.cmd sparkR
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > D:\apache\spark\bin>
> > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> >
> > What was the build error? you didn't say. Are you sure it succeeded?
> > Try running from the Spark home dir, not bin.
> > I know we do run Windows tests and it appears to pass tests, etc.
> >
> > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >>
> >> Unfortunately,
> >>
> >> # Apache Hadoop 2.6.X
> >> ./build/mvn -Pyarn -DskipTests clean package
> >>
> >> # Apache Hadoop 2.7.X and later
> >> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests
> clean package
> >>
> >>
> >> Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >>
> >> But neither works for me (I stay directly under spark root directory
> and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >>
> >> and
> >>
> >> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1
> -DskipTests clean package"
> >>
> >> Now build works. But when I run spark-shell. I got the following
> error.
> >>
> >> D:\apache\spark\bin>spark-shell
> >> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> >> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> >> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> >> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> >> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> >> at scala.Option.getOrElse(Option.scala:189)
> >> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> >> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> >> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> >> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >>
> >>
> >> Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >>
> >> Thanks a lot!
> >>
> >> Ping
> >>
> >
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Please install Apache Spark on Windows as discussed in Apache Spark on Windows - DZone Open Source
|
|
|
| | |
|
|
|
| |
Apache Spark on Windows - DZone Open Source
This article explains and provides solutions for some of the most common errors developers come across when inst...
|
|
|
On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran out of free period. Is there a shared EC2 for Spark that we can use for free?
Ping
On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> Haven't tested but the general procedure is to exclude all guava dependencies that are not needed. The hadoop-common depedency does not have a dependency on guava according to Maven Repository: org.apache.hadoop » hadoop-common
>
> Maven Repository: org.apache.hadoop » hadoop-common
>
> Apache Spark 2.4 has dependency on guava 14.
> If a Docker image for Cloudera Hadoop is used Spark is may be installed on Docker for Windows.
> For Docker on Windows on EC2 refer Getting Started with Docker for Windows - Developer.com
>
> Getting Started with Docker for Windows - Developer.com
>
> Docker for Windows makes it feasible to run a Docker daemon on Windows Server 2016. Learn to harness its power.
>
>
> Conflicting versions is not an issue if Docker is used.
> "Apache Spark applications usually have a complex set of required software dependencies. Spark applications may require specific versions of these dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes with conflicting versions."
> Running Spark in Docker Containers on YARN
>
> Running Spark in Docker Containers on YARN
>
>
>
>
>
> On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
>
> Hi Deepak,
> I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't downloaded for somehow. I'll try something else. Thank you very much for your help!
> Ping
>
> On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> As multiple guava versions are found exclude guava from all the dependecies it could have been downloaded with. And explicitly add a recent guava version.
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>hadoop-common</artifactId>
> <version>3.2.1</version>
> <exclusions>
> <exclusion>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
> <dependency>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> <version>28.1-jre</version>
> </dependency>
> </dependencies>
> </dependencyManagement>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
>
> Hi Deepak,
> Following your suggestion, I put exclusion of guava in topmost POM (under Spark home directly) as follows.
> 2227- </dependency>
> 2228- <dependency>
> 2229- <groupId>org.apache.hadoop</groupId>
> 2230: <artifactId>hadoop-common</artifactId>
> 2231- <version>3.2.1</version>
> 2232- <exclusions>
> 2233- <exclusion>
> 2234- <groupId>com.google.guava</groupId>
> 2235- <artifactId>guava</artifactId>
> 2236- </exclusion>
> 2237- </exclusions>
> 2238- </dependency>
> 2239- </dependencies>
> 2240- </dependencyManagement>
> I also set properties for spark.executor.userClassPathFirst=true and spark.driver.userClassPathFirst=true
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 -Dspark.executor.userClassPathFirst=true -Dspark.driver.userClassPathFirst=true -DskipTests clean package
> and rebuilt spark.
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 25.092 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 22.093 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 19.546 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 10.468 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 17.733 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.531 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 25.327 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 27.264 s]
> [INFO] Spark Project Core ................................. SUCCESS [07:59 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:39 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:08 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [02:56 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:55 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:33 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [08:49 min]
> [INFO] Spark Project Tools ................................ SUCCESS [ 16.967 s]
> [INFO] Spark Project Hive ................................. SUCCESS [06:15 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 10.219 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 11.952 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 11.171 s]
> [INFO] Spark Project REPL ................................. SUCCESS [ 55.029 s]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:22 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:05 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 30.780 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:43 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Before building spark, I went to my local Maven repo and removed guava at all. But after building, I found the same versions of guava have been downloaded.
> D:\mavenrepo\com\google\guava\guava>ls
> 14.0.1 16.0.1 18.0 19.0
> On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Just to clarify, excluding Hadoop provided guava in pom.xml is an alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
>
> Hi Sean,
> Thanks for your response!
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
>
> Thanks Sean.
> Ping
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44 min]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO] ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>>
>> Hello,
>>
>> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>>
>> Unfortunately,
>>
>> # Apache Hadoop 2.6.X
>> ./build/mvn -Pyarn -DskipTests clean package
>>
>> # Apache Hadoop 2.7.X and later
>> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>>
>>
>> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>>
>> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>>
>> and
>>
>> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>>
>> Now build works. But when I run spark-shell. I got the following error.
>>
>> D:\apache\spark\bin>spark-shell
>> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
>> at scala.Option.getOrElse(Option.scala:189)
>> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>>
>> Thanks a lot!
>>
>> Ping
>>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran
out of free period. Is there a shared EC2 for Spark that we can use for
free?
Ping
On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> Haven't tested but the general procedure is to exclude all guava
dependencies that are not needed. The hadoop-common depedency does not have
a dependency on guava according to Maven Repository: org.apache.hadoop »
hadoop-common
>
> Maven Repository: org.apache.hadoop » hadoop-common
>
> Apache Spark 2.4 has dependency on guava 14.
> If a Docker image for Cloudera Hadoop is used Spark is may be installed
on Docker for Windows.
> For Docker on Windows on EC2 refer Getting Started with Docker for
Windows - Developer.com
>
> Getting Started with Docker for Windows - Developer.com
>
> Docker for Windows makes it feasible to run a Docker daemon on Windows
Server 2016. Learn to harness its power.
>
>
> Conflicting versions is not an issue if Docker is used.
> "Apache Spark applications usually have a complex set of required
software dependencies. Spark applications may require specific versions of
these dependencies (such as Pyspark and R) on the Spark executor hosts,
sometimes with conflicting versions."
> Running Spark in Docker Containers on YARN
>
> Running Spark in Docker Containers on YARN
>
>
>
>
>
> On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
pingpinganan@gmail.com> wrote:
>
> Hi Deepak,
> I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
downloaded for somehow. I'll try something else. Thank you very much for
your help!
> Ping
>
> On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> As multiple guava versions are found exclude guava from all the
dependecies it could have been downloaded with. And explicitly add a recent
guava version.
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>hadoop-common</artifactId>
> <version>3.2.1</version>
> <exclusions>
> <exclusion>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
> <dependency>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> <version>28.1-jre</version>
> </dependency>
> </dependencies>
> </dependencyManagement>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
pingpinganan@gmail.com> wrote:
>
> Hi Deepak,
> Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
> 2227- </dependency>
> 2228- <dependency>
> 2229- <groupId>org.apache.hadoop</groupId>
> 2230: <artifactId>hadoop-common</artifactId>
> 2231- <version>3.2.1</version>
> 2232- <exclusions>
> 2233- <exclusion>
> 2234- <groupId>com.google.guava</groupId>
> 2235- <artifactId>guava</artifactId>
> 2236- </exclusion>
> 2237- </exclusions>
> 2238- </dependency>
> 2239- </dependencies>
> 2240- </dependencyManagement>
> I also set properties for spark.executor.userClassPathFirst=true and
spark.driver.userClassPathFirst=true
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
-Dspark.executor.userClassPathFirst=true
-Dspark.driver.userClassPathFirst=true -DskipTests clean package
> and rebuilt spark.
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
25.092 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
22.093 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
19.546 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
10.468 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
17.733 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.531 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
25.327 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
27.264 s]
> [INFO] Spark Project Core ................................. SUCCESS
[07:59 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS
[01:39 min]
> [INFO] Spark Project GraphX ............................... SUCCESS
[02:08 min]
> [INFO] Spark Project Streaming ............................ SUCCESS
[02:56 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS
[08:55 min]
> [INFO] Spark Project SQL .................................. SUCCESS
[12:33 min]
> [INFO] Spark Project ML Library ........................... SUCCESS
[08:49 min]
> [INFO] Spark Project Tools ................................ SUCCESS [
16.967 s]
> [INFO] Spark Project Hive ................................. SUCCESS
[06:15 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
10.219 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
11.952 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
11.171 s]
> [INFO] Spark Project REPL ................................. SUCCESS [
55.029 s]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
[01:07 min]
> [INFO] Spark Project YARN ................................. SUCCESS
[02:22 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
[01:21 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
[02:33 min]
> [INFO] Spark Project Examples ............................. SUCCESS
[02:05 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
30.780 s]
> [INFO] Spark Avro ......................................... SUCCESS
[01:43 min]
> [INFO]
------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
------------------------------------------------------------------------
> [INFO] Total time: 01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO]
------------------------------------------------------------------------
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Before building spark, I went to my local Maven repo and removed guava at
all. But after building, I found the same versions of guava have been
downloaded.
> D:\mavenrepo\com\google\guava\guava>ls
> 14.0.1 16.0.1 18.0 19.0
> On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
pingpinganan@gmail.com> wrote:
>
> Hi Sean,
> Thanks for your response!
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
to Spark home directory and ran mvn from there. Following is my build and
running result. The source code was just updated yesterday. I guess the
POM should specify newer Guava library somehow.
>
> Thanks Sean.
> Ping
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS
[08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS
[01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS
[02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS
[03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS
[08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS
[12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS
[07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS
[01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
[01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS
[02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
[01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
[02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS
[02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS
[01:44 min]
> [INFO]
------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>>
>> Hello,
>>
>> I understand Spark is preferably built on Linux. But I have a Windows
machine with a slow Virtual Box for Linux. So I wish I am able to build
and run Spark code on Windows environment.
>>
>> Unfortunately,
>>
>> # Apache Hadoop 2.6.X
>> ./build/mvn -Pyarn -DskipTests clean package
>>
>> # Apache Hadoop 2.7.X and later
>> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
package
>>
>>
>> Both are listed on
http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>>
>> But neither works for me (I stay directly under spark root directory and
run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
package"
>>
>> and
>>
>> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
clean package"
>>
>> Now build works. But when I run spark-shell. I got the following error.
>>
>> D:\apache\spark\bin>spark-shell
>> Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
>> at scala.Option.getOrElse(Option.scala:189)
>> at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>> Has anyone experienced building and running Spark source code
successfully on Windows? Could you please share your experience?
>>
>> Thanks a lot!
>>
>> Ping
>>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran
out of free period. Is there a shared EC2 for Spark that we can use for
free?
Ping
On Monday, December 9, 2019, Deepak Vohra <dv...@yahoo.com> wrote:
> Haven't tested but the general procedure is to exclude all guava
dependencies that are not needed. The hadoop-common depedency does not have
a dependency on guava according to Maven Repository: org.apache.hadoop »
hadoop-common
>
> Maven Repository: org.apache.hadoop » hadoop-common
>
> Apache Spark 2.4 has dependency on guava 14.
> If a Docker image for Cloudera Hadoop is used Spark is may be installed
on Docker for Windows.
> For Docker on Windows on EC2 refer Getting Started with Docker for
Windows - Developer.com
>
> Getting Started with Docker for Windows - Developer.com
>
> Docker for Windows makes it feasible to run a Docker daemon on Windows
Server 2016. Learn to harness its power.
>
>
> Conflicting versions is not an issue if Docker is used.
> "Apache Spark applications usually have a complex set of required
software dependencies. Spark applications may require specific versions of
these dependencies (such as Pyspark and R) on the Spark executor hosts,
sometimes with conflicting versions."
> Running Spark in Docker Containers on YARN
>
> Running Spark in Docker Containers on YARN
>
>
>
>
>
> On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <
pingpinganan@gmail.com> wrote:
>
> Hi Deepak,
> I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
downloaded for somehow. I'll try something else. Thank you very much for
your help!
> Ping
>
> On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> As multiple guava versions are found exclude guava from all the
dependecies it could have been downloaded with. And explicitly add a recent
guava version.
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>hadoop-common</artifactId>
> <version>3.2.1</version>
> <exclusions>
> <exclusion>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
> <dependency>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> <version>28.1-jre</version>
> </dependency>
> </dependencies>
> </dependencyManagement>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
pingpinganan@gmail.com> wrote:
>
> Hi Deepak,
> Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
> 2227- </dependency>
> 2228- <dependency>
> 2229- <groupId>org.apache.hadoop</groupId>
> 2230: <artifactId>hadoop-common</artifactId>
> 2231- <version>3.2.1</version>
> 2232- <exclusions>
> 2233- <exclusion>
> 2234- <groupId>com.google.guava</groupId>
> 2235- <artifactId>guava</artifactId>
> 2236- </exclusion>
> 2237- </exclusions>
> 2238- </dependency>
> 2239- </dependencies>
> 2240- </dependencyManagement>
> I also set properties for spark.executor.userClassPathFirst=true and
spark.driver.userClassPathFirst=true
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
-Dspark.executor.userClassPathFirst=true
-Dspark.driver.userClassPathFirst=true -DskipTests clean package
> and rebuilt spark.
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
25.092 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
22.093 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
19.546 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
10.468 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
17.733 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.531 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
25.327 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
27.264 s]
> [INFO] Spark Project Core ................................. SUCCESS
[07:59 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS
[01:39 min]
> [INFO] Spark Project GraphX ............................... SUCCESS
[02:08 min]
> [INFO] Spark Project Streaming ............................ SUCCESS
[02:56 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS
[08:55 min]
> [INFO] Spark Project SQL .................................. SUCCESS
[12:33 min]
> [INFO] Spark Project ML Library ........................... SUCCESS
[08:49 min]
> [INFO] Spark Project Tools ................................ SUCCESS [
16.967 s]
> [INFO] Spark Project Hive ................................. SUCCESS
[06:15 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
10.219 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
11.952 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
11.171 s]
> [INFO] Spark Project REPL ................................. SUCCESS [
55.029 s]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
[01:07 min]
> [INFO] Spark Project YARN ................................. SUCCESS
[02:22 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
[01:21 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
[02:33 min]
> [INFO] Spark Project Examples ............................. SUCCESS
[02:05 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
30.780 s]
> [INFO] Spark Avro ......................................... SUCCESS
[01:43 min]
> [INFO]
------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
------------------------------------------------------------------------
> [INFO] Total time: 01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO]
------------------------------------------------------------------------
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Before building spark, I went to my local Maven repo and removed guava at
all. But after building, I found the same versions of guava have been
downloaded.
> D:\mavenrepo\com\google\guava\guava>ls
> 14.0.1 16.0.1 18.0 19.0
> On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
pingpinganan@gmail.com> wrote:
>
> Hi Sean,
> Thanks for your response!
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go
to Spark home directory and ran mvn from there. Following is my build and
running result. The source code was just updated yesterday. I guess the
POM should specify newer Guava library somehow.
>
> Thanks Sean.
> Ping
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS
[08:03 min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS
[01:51 min]
> [INFO] Spark Project GraphX ............................... SUCCESS
[02:20 min]
> [INFO] Spark Project Streaming ............................ SUCCESS
[03:16 min]
> [INFO] Spark Project Catalyst ............................. SUCCESS
[08:45 min]
> [INFO] Spark Project SQL .................................. SUCCESS
[12:12 min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS
[07:50 min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS
[01:03 min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS
[01:19 min]
> [INFO] Spark Project YARN ................................. SUCCESS
[02:19 min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS
[01:20 min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS
[02:26 min]
> [INFO] Spark Project Examples ............................. SUCCESS
[02:00 min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS
[01:44 min]
> [INFO]
------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>>
>> Hello,
>>
>> I understand Spark is preferably built on Linux. But I have a Windows
machine with a slow Virtual Box for Linux. So I wish I am able to build
and run Spark code on Windows environment.
>>
>> Unfortunately,
>>
>> # Apache Hadoop 2.6.X
>> ./build/mvn -Pyarn -DskipTests clean package
>>
>> # Apache Hadoop 2.7.X and later
>> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
package
>>
>>
>> Both are listed on
http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>>
>> But neither works for me (I stay directly under spark root directory and
run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
package"
>>
>> and
>>
>> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
clean package"
>>
>> Now build works. But when I run spark-shell. I got the following error.
>>
>> D:\apache\spark\bin>spark-shell
>> Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
>> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
>> at
org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
>> at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
>> at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
>> at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>> at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
>> at scala.Option.getOrElse(Option.scala:189)
>> at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>> at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
>> at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>> at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>> at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>> at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>> at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>> Has anyone experienced building and running Spark source code
successfully on Windows? Could you please share your experience?
>>
>> Thanks a lot!
>>
>> Ping
>>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Haven't tested but the general procedure is to exclude all guava dependencies that are not needed. The hadoop-common depedency does not have a dependency on guava according to Maven Repository: org.apache.hadoop » hadoop-common
|
|
|
| | |
|
|
|
| |
Maven Repository: org.apache.hadoop » hadoop-common
|
|
|
Apache Spark 2.4 has dependency on guava 14.
If a Docker image for Cloudera Hadoop is used Spark is may be installed on Docker for Windows.
For Docker on Windows on EC2 refer Getting Started with Docker for Windows - Developer.com
|
|
|
| | |
|
|
|
| |
Getting Started with Docker for Windows - Developer.com
Docker for Windows makes it feasible to run a Docker daemon on Windows Server 2016. Learn to harness its power.
|
|
|
Conflicting versions is not an issue if Docker is used.
"Apache Spark applications usually have a complex set of required software dependencies. Spark applications may require specific versions of these dependencies (such as Pyspark and R) on the Spark executor hosts, sometimes with conflicting versions."Running Spark in Docker Containers on YARN
|
|
|
| | |
|
|
|
| |
Running Spark in Docker Containers on YARN
|
|
|
On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't downloaded for somehow. I'll try something else. Thank you very much for your help!
Ping
On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
As multiple guava versions are found exclude guava from all the dependecies it could have been downloaded with. And explicitly add a recent guava version.
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.2.1</version> <exclusions> <exclusion> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> </exclusion> </exclusions> </dependency><dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>28.1-jre</version></dependency> </dependencies> </dependencyManagement>
On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under Spark home directly) as follows.
2227- </dependency>
2228- <dependency>
2229- <groupId>org.apache.hadoop</groupId>
2230: <artifactId>hadoop-common</artifactId>
2231- <version>3.2.1</version>
2232- <exclusions>
2233- <exclusion>
2234- <groupId>com.google.guava</groupId>
2235- <artifactId>guava</artifactId>
2236- </exclusion>
2237- </exclusions>
2238- </dependency>
2239- </dependencies>
2240- </dependencyManagement>
I also set properties for spark.executor.userClassPathFirst=true and spark.driver.userClassPathFirst=true
D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 -Dspark.executor.userClassPathFirst=true -Dspark.driver.userClassPathFirst=true -DskipTests clean package
and rebuilt spark.
But I got the same error when running spark-shell.
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 25.092 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 22.093 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 19.546 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 10.468 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.733 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.531 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 25.327 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 27.264 s]
[INFO] Spark Project Core ................................. SUCCESS [07:59 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:39 min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:08 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:56 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:55 min]
[INFO] Spark Project SQL .................................. SUCCESS [12:33 min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:49 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 16.967 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:15 min]
[INFO] Spark Project Graph API ............................ SUCCESS [ 10.219 s]
[INFO] Spark Project Cypher ............................... SUCCESS [ 11.952 s]
[INFO] Spark Project Graph ................................ SUCCESS [ 11.171 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 55.029 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07 min]
[INFO] Spark Project YARN ................................. SUCCESS [02:22 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 21.483 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 56.450 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:05 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 30.780 s]
[INFO] Spark Avro ......................................... SUCCESS [01:43 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:08 h
[INFO] Finished at: 2019-12-06T11:43:08-08:00
[INFO] ------------------------------------------------------------------------
D:\apache\spark>spark-shell
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
D:\apache\spark>cd bin
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown Source)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Before building spark, I went to my local Maven repo and removed guava at all. But after building, I found the same versions of guava have been downloaded.
D:\mavenrepo\com\google\guava\guava>ls
14.0.1 16.0.1 18.0 19.0
On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
Just to clarify, excluding Hadoop provided guava in pom.xml is an alternative to using an Uber jar, which is a more involved process.
On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Sean,
Thanks for your response!
Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
Thanks Sean.
Ping
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
[INFO] Spark Project Core ................................. SUCCESS [08:03 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
[INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
[INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
[INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
[INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
[INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
[INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
[INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
[INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
[INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
[INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
[INFO] Spark Avro ......................................... SUCCESS [01:44 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17:30 h
[INFO] Finished at: 2019-12-05T12:20:01-08:00
[INFO] ------------------------------------------------------------------------
D:\apache\spark>cd bin
D:\apache\spark\bin>ls
beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark\bin>
On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
downloaded for somehow. I'll try something else. Thank you very much for
your help!
Ping
On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
> As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
>
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>hadoop-common</artifactId>
> <version>3.2.1</version>
> <exclusions>
> <exclusion>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
> <dependency>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> <version>28.1-jre</version>
> </dependency>
> </dependencies>
> </dependencyManagement>
>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Following your suggestion, I put exclusion of guava in topmost POM (under
> Spark home directly) as follows.
>
> 2227- </dependency>
> 2228- <dependency>
> 2229- <groupId>org.apache.hadoop</groupId>
> 2230: <artifactId>hadoop-common</artifactId>
> 2231- <version>3.2.1</version>
> 2232- <exclusions>
> 2233- <exclusion>
> 2234- <groupId>com.google.guava</groupId>
> 2235- <artifactId>guava</artifactId>
> 2236- </exclusion>
> 2237- </exclusions>
> 2238- </dependency>
> 2239- </dependencies>
> 2240- </dependencyManagement>
>
> I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
>
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
>
> and rebuilt spark.
>
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 25.092 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 22.093 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 19.546 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 10.468 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 17.733 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.531 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 25.327 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 27.264 s]
> [INFO] Spark Project Core ................................. SUCCESS [07:59
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:39
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:08
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [02:56
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:55
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:33
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [08:49
> min]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 16.967 s]
> [INFO] Spark Project Hive ................................. SUCCESS [06:15
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 10.219 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 11.952 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 11.171 s]
> [INFO] Spark Project REPL ................................. SUCCESS [
> 55.029 s]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:22
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:05
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 30.780 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:43
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Before building spark, I went to my local Maven repo and removed guava at
> all. But after building, I found the same versions of guava have been
> downloaded.
>
> D:\mavenrepo\com\google\guava\guava>ls
> 14.0.1 16.0.1 18.0 19.0
>
> On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
> Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03
> min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
> sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't
downloaded for somehow. I'll try something else. Thank you very much for
your help!
Ping
On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dv...@yahoo.com> wrote:
> As multiple guava versions are found exclude guava from all the
> dependecies it could have been downloaded with. And explicitly add a recent
> guava version.
>
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>hadoop-common</artifactId>
> <version>3.2.1</version>
> <exclusions>
> <exclusion>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
> <dependency>
> <groupId>com.google.guava</groupId>
> <artifactId>guava</artifactId>
> <version>28.1-jre</version>
> </dependency>
> </dependencies>
> </dependencyManagement>
>
>
> On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Following your suggestion, I put exclusion of guava in topmost POM (under
> Spark home directly) as follows.
>
> 2227- </dependency>
> 2228- <dependency>
> 2229- <groupId>org.apache.hadoop</groupId>
> 2230: <artifactId>hadoop-common</artifactId>
> 2231- <version>3.2.1</version>
> 2232- <exclusions>
> 2233- <exclusion>
> 2234- <groupId>com.google.guava</groupId>
> 2235- <artifactId>guava</artifactId>
> 2236- </exclusion>
> 2237- </exclusions>
> 2238- </dependency>
> 2239- </dependencies>
> 2240- </dependencyManagement>
>
> I also set properties for spark.executor.userClassPathFirst=true and
> spark.driver.userClassPathFirst=true
>
> D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
> -Dspark.executor.userClassPathFirst=true
> -Dspark.driver.userClassPathFirst=true -DskipTests clean package
>
> and rebuilt spark.
>
> But I got the same error when running spark-shell.
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 25.092 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 22.093 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 19.546 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 10.468 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 17.733 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.531 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 25.327 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 27.264 s]
> [INFO] Spark Project Core ................................. SUCCESS [07:59
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:39
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:08
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [02:56
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:55
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:33
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [08:49
> min]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 16.967 s]
> [INFO] Spark Project Hive ................................. SUCCESS [06:15
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 10.219 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 11.952 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 11.171 s]
> [INFO] Spark Project REPL ................................. SUCCESS [
> 55.029 s]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:22
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 21.483 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 56.450 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:05
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 30.780 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:43
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 01:08 h
> [INFO] Finished at: 2019-12-06T11:43:08-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>spark-shell
> 'spark-shell' is not recognized as an internal or external command,
> operable program or batch file.
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Before building spark, I went to my local Maven repo and removed guava at
> all. But after building, I found the same versions of guava have been
> downloaded.
>
> D:\mavenrepo\com\google\guava\guava>ls
> 14.0.1 16.0.1 18.0 19.0
>
> On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
> Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03
> min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
> sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
As multiple guava versions are found exclude guava from all the dependecies it could have been downloaded with. And explicitly add a recent guava version.
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.2.1</version> <exclusions> <exclusion> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> </exclusion> </exclusions> </dependency><dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>28.1-jre</version></dependency> </dependencies> </dependencyManagement>
On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under Spark home directly) as follows.
2227- </dependency>
2228- <dependency>
2229- <groupId>org.apache.hadoop</groupId>
2230: <artifactId>hadoop-common</artifactId>
2231- <version>3.2.1</version>
2232- <exclusions>
2233- <exclusion>
2234- <groupId>com.google.guava</groupId>
2235- <artifactId>guava</artifactId>
2236- </exclusion>
2237- </exclusions>
2238- </dependency>
2239- </dependencies>
2240- </dependencyManagement>
I also set properties for spark.executor.userClassPathFirst=true and spark.driver.userClassPathFirst=true
D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 -Dspark.executor.userClassPathFirst=true -Dspark.driver.userClassPathFirst=true -DskipTests clean package
and rebuilt spark.
But I got the same error when running spark-shell.
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 25.092 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 22.093 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 19.546 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 10.468 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.733 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.531 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 25.327 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 27.264 s]
[INFO] Spark Project Core ................................. SUCCESS [07:59 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:39 min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:08 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:56 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:55 min]
[INFO] Spark Project SQL .................................. SUCCESS [12:33 min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:49 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 16.967 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:15 min]
[INFO] Spark Project Graph API ............................ SUCCESS [ 10.219 s]
[INFO] Spark Project Cypher ............................... SUCCESS [ 11.952 s]
[INFO] Spark Project Graph ................................ SUCCESS [ 11.171 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 55.029 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07 min]
[INFO] Spark Project YARN ................................. SUCCESS [02:22 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 21.483 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 56.450 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:05 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 30.780 s]
[INFO] Spark Avro ......................................... SUCCESS [01:43 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:08 h
[INFO] Finished at: 2019-12-06T11:43:08-08:00
[INFO] ------------------------------------------------------------------------
D:\apache\spark>spark-shell
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
D:\apache\spark>cd bin
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown Source)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Before building spark, I went to my local Maven repo and removed guava at all. But after building, I found the same versions of guava have been downloaded.
D:\mavenrepo\com\google\guava\guava>ls
14.0.1 16.0.1 18.0 19.0
On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
Just to clarify, excluding Hadoop provided guava in pom.xml is an alternative to using an Uber jar, which is a more involved process.
On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Sean,
Thanks for your response!
Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
Thanks Sean.
Ping
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
[INFO] Spark Project Core ................................. SUCCESS [08:03 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
[INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
[INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
[INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
[INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
[INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
[INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
[INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
[INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
[INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
[INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
[INFO] Spark Avro ......................................... SUCCESS [01:44 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17:30 h
[INFO] Finished at: 2019-12-05T12:20:01-08:00
[INFO] ------------------------------------------------------------------------
D:\apache\spark>cd bin
D:\apache\spark\bin>ls
beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark\bin>
On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
2227- </dependency>
2228- <dependency>
2229- <groupId>org.apache.hadoop</groupId>
2230: <artifactId>hadoop-common</artifactId>
2231- <version>3.2.1</version>
2232- <exclusions>
2233- <exclusion>
2234- <groupId>com.google.guava</groupId>
2235- <artifactId>guava</artifactId>
2236- </exclusion>
2237- </exclusions>
2238- </dependency>
2239- </dependencies>
2240- </dependencyManagement>
I also set properties for spark.executor.userClassPathFirst=true and
spark.driver.userClassPathFirst=true
D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
-Dspark.executor.userClassPathFirst=true
-Dspark.driver.userClassPathFirst=true -DskipTests clean package
and rebuilt spark.
But I got the same error when running spark-shell.
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [
25.092 s]
[INFO] Spark Project Tags ................................. SUCCESS [
22.093 s]
[INFO] Spark Project Sketch ............................... SUCCESS [
19.546 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
10.468 s]
[INFO] Spark Project Networking ........................... SUCCESS [
17.733 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.531 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
25.327 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
27.264 s]
[INFO] Spark Project Core ................................. SUCCESS [07:59
min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:39
min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:08
min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:56
min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:55
min]
[INFO] Spark Project SQL .................................. SUCCESS [12:33
min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:49
min]
[INFO] Spark Project Tools ................................ SUCCESS [
16.967 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:15
min]
[INFO] Spark Project Graph API ............................ SUCCESS [
10.219 s]
[INFO] Spark Project Cypher ............................... SUCCESS [
11.952 s]
[INFO] Spark Project Graph ................................ SUCCESS [
11.171 s]
[INFO] Spark Project REPL ................................. SUCCESS [
55.029 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07
min]
[INFO] Spark Project YARN ................................. SUCCESS [02:22
min]
[INFO] Spark Project Assembly ............................. SUCCESS [
21.483 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
56.450 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21
min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33
min]
[INFO] Spark Project Examples ............................. SUCCESS [02:05
min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
30.780 s]
[INFO] Spark Avro ......................................... SUCCESS [01:43
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 01:08 h
[INFO] Finished at: 2019-12-06T11:43:08-08:00
[INFO]
------------------------------------------------------------------------
D:\apache\spark>spark-shell
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
D:\apache\spark>cd bin
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Before building spark, I went to my local Maven repo and removed guava at
all. But after building, I found the same versions of guava have been
downloaded.
D:\mavenrepo\com\google\guava\guava>ls
14.0.1 16.0.1 18.0 19.0
On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
> Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03
> min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
> sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
Following your suggestion, I put exclusion of guava in topmost POM (under
Spark home directly) as follows.
2227- </dependency>
2228- <dependency>
2229- <groupId>org.apache.hadoop</groupId>
2230: <artifactId>hadoop-common</artifactId>
2231- <version>3.2.1</version>
2232- <exclusions>
2233- <exclusion>
2234- <groupId>com.google.guava</groupId>
2235- <artifactId>guava</artifactId>
2236- </exclusion>
2237- </exclusions>
2238- </dependency>
2239- </dependencies>
2240- </dependencyManagement>
I also set properties for spark.executor.userClassPathFirst=true and
spark.driver.userClassPathFirst=true
D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1
-Dspark.executor.userClassPathFirst=true
-Dspark.driver.userClassPathFirst=true -DskipTests clean package
and rebuilt spark.
But I got the same error when running spark-shell.
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [
25.092 s]
[INFO] Spark Project Tags ................................. SUCCESS [
22.093 s]
[INFO] Spark Project Sketch ............................... SUCCESS [
19.546 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
10.468 s]
[INFO] Spark Project Networking ........................... SUCCESS [
17.733 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.531 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
25.327 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
27.264 s]
[INFO] Spark Project Core ................................. SUCCESS [07:59
min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:39
min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:08
min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:56
min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:55
min]
[INFO] Spark Project SQL .................................. SUCCESS [12:33
min]
[INFO] Spark Project ML Library ........................... SUCCESS [08:49
min]
[INFO] Spark Project Tools ................................ SUCCESS [
16.967 s]
[INFO] Spark Project Hive ................................. SUCCESS [06:15
min]
[INFO] Spark Project Graph API ............................ SUCCESS [
10.219 s]
[INFO] Spark Project Cypher ............................... SUCCESS [
11.952 s]
[INFO] Spark Project Graph ................................ SUCCESS [
11.171 s]
[INFO] Spark Project REPL ................................. SUCCESS [
55.029 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:07
min]
[INFO] Spark Project YARN ................................. SUCCESS [02:22
min]
[INFO] Spark Project Assembly ............................. SUCCESS [
21.483 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
56.450 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:21
min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:33
min]
[INFO] Spark Project Examples ............................. SUCCESS [02:05
min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
30.780 s]
[INFO] Spark Avro ......................................... SUCCESS [01:43
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 01:08 h
[INFO] Finished at: 2019-12-06T11:43:08-08:00
[INFO]
------------------------------------------------------------------------
D:\apache\spark>spark-shell
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
D:\apache\spark>cd bin
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Before building spark, I went to my local Maven repo and removed guava at
all. But after building, I found the same versions of guava have been
downloaded.
D:\mavenrepo\com\google\guava\guava>ls
14.0.1 16.0.1 18.0 19.0
On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Just to clarify, excluding Hadoop provided guava in pom.xml is an
> alternative to using an Uber jar, which is a more involved process.
>
> On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Sean,
>
> Thanks for your response!
>
> Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
> Spark home directory and ran mvn from there. Following is my build and
> running result. The source code was just updated yesterday. I guess the
> POM should specify newer Guava library somehow.
>
> Thanks Sean.
>
> Ping
>
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO]
> [INFO] Spark Project Parent POM ........................... SUCCESS [
> 14.794 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 18.233 s]
> [INFO] Spark Project Sketch ............................... SUCCESS [
> 20.077 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 7.846 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 14.906 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 6.267 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 31.710 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 10.227 s]
> [INFO] Spark Project Core ................................. SUCCESS [08:03
> min]
> [INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
> min]
> [INFO] Spark Project GraphX ............................... SUCCESS [02:20
> min]
> [INFO] Spark Project Streaming ............................ SUCCESS [03:16
> min]
> [INFO] Spark Project Catalyst ............................. SUCCESS [08:45
> min]
> [INFO] Spark Project SQL .................................. SUCCESS [12:12
> min]
> [INFO] Spark Project ML Library ........................... SUCCESS [
> 16:28 h]
> [INFO] Spark Project Tools ................................ SUCCESS [
> 23.602 s]
> [INFO] Spark Project Hive ................................. SUCCESS [07:50
> min]
> [INFO] Spark Project Graph API ............................ SUCCESS [
> 8.734 s]
> [INFO] Spark Project Cypher ............................... SUCCESS [
> 12.420 s]
> [INFO] Spark Project Graph ................................ SUCCESS [
> 10.186 s]
> [INFO] Spark Project REPL ................................. SUCCESS [01:03
> min]
> [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
> min]
> [INFO] Spark Project YARN ................................. SUCCESS [02:19
> min]
> [INFO] Spark Project Assembly ............................. SUCCESS [
> 18.912 s]
> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
> 57.925 s]
> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
> min]
> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
> min]
> [INFO] Spark Project Examples ............................. SUCCESS [02:00
> min]
> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
> 28.354 s]
> [INFO] Spark Avro ......................................... SUCCESS [01:44
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 17:30 h
> [INFO] Finished at: 2019-12-05T12:20:01-08:00
> [INFO]
> ------------------------------------------------------------------------
>
> D:\apache\spark>cd bin
>
> D:\apache\spark\bin>ls
> beeline load-spark-env.cmd run-example spark-shell
> spark-sql2.cmd sparkR.cmd
> beeline.cmd load-spark-env.sh run-example.cmd
> spark-shell.cmd spark-submit sparkR2.cmd
> docker-image-tool.sh pyspark spark-class
> spark-shell2.cmd spark-submit.cmd
> find-spark-home pyspark.cmd spark-class.cmd spark-sql
> spark-submit2.cmd
> find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
> sparkR
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> at scala.Option.getOrElse(Option.scala:189)
> at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> D:\apache\spark\bin>
>
> On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Just to clarify, excluding Hadoop provided guava in pom.xml is an alternative to using an Uber jar, which is a more involved process.
On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Sean,
Thanks for your response!
Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library somehow.
Thanks Sean.
Ping
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 14.794 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 18.233 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 20.077 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 7.846 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 14.906 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 6.267 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 31.710 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 10.227 s]
[INFO] Spark Project Core ................................. SUCCESS [08:03 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:51 min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:20 min]
[INFO] Spark Project Streaming ............................ SUCCESS [03:16 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:45 min]
[INFO] Spark Project SQL .................................. SUCCESS [12:12 min]
[INFO] Spark Project ML Library ........................... SUCCESS [ 16:28 h]
[INFO] Spark Project Tools ................................ SUCCESS [ 23.602 s]
[INFO] Spark Project Hive ................................. SUCCESS [07:50 min]
[INFO] Spark Project Graph API ............................ SUCCESS [ 8.734 s]
[INFO] Spark Project Cypher ............................... SUCCESS [ 12.420 s]
[INFO] Spark Project Graph ................................ SUCCESS [ 10.186 s]
[INFO] Spark Project REPL ................................. SUCCESS [01:03 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19 min]
[INFO] Spark Project YARN ................................. SUCCESS [02:19 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 18.912 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 57.925 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:00 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 28.354 s]
[INFO] Spark Avro ......................................... SUCCESS [01:44 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17:30 h
[INFO] Finished at: 2019-12-05T12:20:01-08:00
[INFO] ------------------------------------------------------------------------
D:\apache\spark>cd bin
D:\apache\spark\bin>ls
beeline load-spark-env.cmd run-example spark-shell spark-sql2.cmd sparkR.cmd
beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd spark-submit sparkR2.cmd
docker-image-tool.sh pyspark spark-class spark-shell2.cmd spark-submit.cmd
find-spark-home pyspark.cmd spark-class.cmd spark-sql spark-submit2.cmd
find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd sparkR
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark\bin>
On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Sean,
Thanks for your response!
Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
Spark home directory and ran mvn from there. Following is my build and
running result. The source code was just updated yesterday. I guess the
POM should specify newer Guava library somehow.
Thanks Sean.
Ping
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [
14.794 s]
[INFO] Spark Project Tags ................................. SUCCESS [
18.233 s]
[INFO] Spark Project Sketch ............................... SUCCESS [
20.077 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
7.846 s]
[INFO] Spark Project Networking ........................... SUCCESS [
14.906 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.267 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
31.710 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
10.227 s]
[INFO] Spark Project Core ................................. SUCCESS [08:03
min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:20
min]
[INFO] Spark Project Streaming ............................ SUCCESS [03:16
min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:45
min]
[INFO] Spark Project SQL .................................. SUCCESS [12:12
min]
[INFO] Spark Project ML Library ........................... SUCCESS [
16:28 h]
[INFO] Spark Project Tools ................................ SUCCESS [
23.602 s]
[INFO] Spark Project Hive ................................. SUCCESS [07:50
min]
[INFO] Spark Project Graph API ............................ SUCCESS [
8.734 s]
[INFO] Spark Project Cypher ............................... SUCCESS [
12.420 s]
[INFO] Spark Project Graph ................................ SUCCESS [
10.186 s]
[INFO] Spark Project REPL ................................. SUCCESS [01:03
min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
min]
[INFO] Spark Project YARN ................................. SUCCESS [02:19
min]
[INFO] Spark Project Assembly ............................. SUCCESS [
18.912 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
57.925 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
min]
[INFO] Spark Project Examples ............................. SUCCESS [02:00
min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
28.354 s]
[INFO] Spark Avro ......................................... SUCCESS [01:44
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 17:30 h
[INFO] Finished at: 2019-12-05T12:20:01-08:00
[INFO]
------------------------------------------------------------------------
D:\apache\spark>cd bin
D:\apache\spark\bin>ls
beeline load-spark-env.cmd run-example spark-shell
spark-sql2.cmd sparkR.cmd
beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd
spark-submit sparkR2.cmd
docker-image-tool.sh pyspark spark-class
spark-shell2.cmd spark-submit.cmd
find-spark-home pyspark.cmd spark-class.cmd spark-sql
spark-submit2.cmd
find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
sparkR
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark\bin>
On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Is Hadoop 3.x not set as a dependency? If so, exclude the guava provided by Hadoop.
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.2.1</version><exclusions> <exclusion> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> </exclusion> </exclusions></dependency>
On Friday, December 6, 2019, 12:20:49 AM UTC, Ping Liu <pi...@gmail.com> wrote:
Thanks Deepak! I'll try it.
On Thu, Dec 5, 2019 at 4:13 PM Deepak Vohra <dv...@yahoo.com> wrote:
The Guava issue could be fixed in one of two ways:
- Use Hadoop v3- Create an Uber jar, referhttps://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2
Managing Java dependencies for Apache Spark applications on Cloud Dataproc | Google Cloud Blog
|
|
|
| | |
|
|
|
| |
Managing Java dependencies for Apache Spark applications on Cloud Datapr...
Learn how to set up Java imported packages for Apache Spark on Cloud Dataproc to avoid conflicts.
|
|
|
On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
For Spark, I am using master branch and just have code updated yesterday.
For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1.
But even in 14.0,1 (https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter.
| static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) |
The newer Guava version, checkArgument() all require boolean as first parameter.
For Docker, using EC2 is a good idea. Is there a document or guidance for it?
Thanks.
Ping
On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml.
Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support.
On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email.
Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible.
Thanks!
Ping
On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
Several alternatives are available:
- Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven
- Use Docker image for CDH on WindowsDocker Hub
|
|
| |
Docker Hub
|
|
|
On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Thanks Deepak! I'll try it.
On Thu, Dec 5, 2019 at 4:13 PM Deepak Vohra <dv...@yahoo.com> wrote:
> The Guava issue could be fixed in one of two ways:
>
> - Use Hadoop v3
> - Create an Uber jar, refer
>
> https://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2
> Managing Java dependencies for Apache Spark applications on Cloud
> Dataproc | Google Cloud Blog
> <https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc>
>
> Managing Java dependencies for Apache Spark applications on Cloud Datapr...
>
> Learn how to set up Java imported packages for Apache Spark on Cloud
> Dataproc to avoid conflicts.
>
> <https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc>
>
>
>
> On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> For Spark, I am using master branch and just have code updated yesterday.
>
> For Guava, I actually deleted my old versions from the local Maven repo.
> The build process of Spark automatically downloaded a few versions. The
> oldest version is 14.0.1.
>
> But even in 14.0,1 (
> https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html)
> Preconditions already requires boolean as first parameter.
>
> static void *checkArgument
> <https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,%20java.lang.String,%20java.lang.Object...)>*(boolean expression,
> String
> <http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true> errorMessageTemplate,
> Object
> <http://download.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true>
> ... errorMessageArgs)
>
> The newer Guava version, checkArgument() all require boolean as first
> parameter.
>
> For Docker, using EC2 is a good idea. Is there a document or guidance for
> it?
>
> Thanks.
>
> Ping
>
>
>
> On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Such type exception could occur if a dependency (most likely Guava)
> version is not supported by the Spark version. What is the Spark and Guava
> versions? Use a more recent Guava version dependency in Maven pom.xml.
>
> Regarding Docker, a cloud platform instance such as EC2 could be used with
> Hyper-V support.
>
> On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Yes, I did use Maven. I even have the build pass successfully when setting
> Hadoop version to 3.2. Please see my response to Sean's email.
>
> Unfortunately, I only have Docker Toolbox as my Windows doesn't have
> Microsoft Hyper-V. So I want to avoid using Docker to do major work if
> possible.
>
> Thanks!
>
> Ping
>
>
> On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Several alternatives are available:
>
> - Use Maven to build Spark on Windows.
> http://spark.apache.org/docs/latest/building-spark.html#apache-maven
>
> - Use Docker image for CDH on Windows
> Docker Hub <https://hub.docker.com/u/cloudera>
>
> Docker Hub
>
> <https://hub.docker.com/u/cloudera>
>
>
>
>
> On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Thanks Deepak! I'll try it.
On Thu, Dec 5, 2019 at 4:13 PM Deepak Vohra <dv...@yahoo.com> wrote:
> The Guava issue could be fixed in one of two ways:
>
> - Use Hadoop v3
> - Create an Uber jar, refer
>
> https://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2
> Managing Java dependencies for Apache Spark applications on Cloud
> Dataproc | Google Cloud Blog
> <https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc>
>
> Managing Java dependencies for Apache Spark applications on Cloud Datapr...
>
> Learn how to set up Java imported packages for Apache Spark on Cloud
> Dataproc to avoid conflicts.
>
> <https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc>
>
>
>
> On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> For Spark, I am using master branch and just have code updated yesterday.
>
> For Guava, I actually deleted my old versions from the local Maven repo.
> The build process of Spark automatically downloaded a few versions. The
> oldest version is 14.0.1.
>
> But even in 14.0,1 (
> https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html)
> Preconditions already requires boolean as first parameter.
>
> static void *checkArgument
> <https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,%20java.lang.String,%20java.lang.Object...)>*(boolean expression,
> String
> <http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true> errorMessageTemplate,
> Object
> <http://download.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true>
> ... errorMessageArgs)
>
> The newer Guava version, checkArgument() all require boolean as first
> parameter.
>
> For Docker, using EC2 is a good idea. Is there a document or guidance for
> it?
>
> Thanks.
>
> Ping
>
>
>
> On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Such type exception could occur if a dependency (most likely Guava)
> version is not supported by the Spark version. What is the Spark and Guava
> versions? Use a more recent Guava version dependency in Maven pom.xml.
>
> Regarding Docker, a cloud platform instance such as EC2 could be used with
> Hyper-V support.
>
> On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Yes, I did use Maven. I even have the build pass successfully when setting
> Hadoop version to 3.2. Please see my response to Sean's email.
>
> Unfortunately, I only have Docker Toolbox as my Windows doesn't have
> Microsoft Hyper-V. So I want to avoid using Docker to do major work if
> possible.
>
> Thanks!
>
> Ping
>
>
> On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Several alternatives are available:
>
> - Use Maven to build Spark on Windows.
> http://spark.apache.org/docs/latest/building-spark.html#apache-maven
>
> - Use Docker image for CDH on Windows
> Docker Hub <https://hub.docker.com/u/cloudera>
>
> Docker Hub
>
> <https://hub.docker.com/u/cloudera>
>
>
>
>
> On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
The Guava issue could be fixed in one of two ways:
- Use Hadoop v3- Create an Uber jar, referhttps://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2
Managing Java dependencies for Apache Spark applications on Cloud Dataproc | Google Cloud Blog
|
|
|
| | |
|
|
|
| |
Managing Java dependencies for Apache Spark applications on Cloud Datapr...
Learn how to set up Java imported packages for Apache Spark on Cloud Dataproc to avoid conflicts.
|
|
|
On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
For Spark, I am using master branch and just have code updated yesterday.
For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1.
But even in 14.0,1 (https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter.
| static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) |
The newer Guava version, checkArgument() all require boolean as first parameter.
For Docker, using EC2 is a good idea. Is there a document or guidance for it?
Thanks.
Ping
On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml.
Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support.
On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email.
Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible.
Thanks!
Ping
On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
Several alternatives are available:
- Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven
- Use Docker image for CDH on WindowsDocker Hub
|
|
| |
Docker Hub
|
|
|
On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Sorry, didn't notice, Hadoop v3.x is already being used.
On Thursday, December 5, 2019, 11:49:47 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
For Spark, I am using master branch and just have code updated yesterday.
For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1.
But even in 14.0,1 (https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html) Preconditions already requires boolean as first parameter.
| static void | checkArgument(boolean expression, String errorMessageTemplate, Object... errorMessageArgs) |
The newer Guava version, checkArgument() all require boolean as first parameter.
For Docker, using EC2 is a good idea. Is there a document or guidance for it?
Thanks.
Ping
On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml.
Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support.
On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email.
Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible.
Thanks!
Ping
On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
Several alternatives are available:
- Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven
- Use Docker image for CDH on WindowsDocker Hub
|
|
| |
Docker Hub
|
|
|
On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
For Spark, I am using master branch and just have code updated yesterday.
For Guava, I actually deleted my old versions from the local Maven repo.
The build process of Spark automatically downloaded a few versions. The
oldest version is 14.0.1.
But even in 14.0,1 (
https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html)
Preconditions already requires boolean as first parameter.
static void *checkArgument
<https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,
java.lang.String, java.lang.Object...)>*(boolean expression, String
<http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true>
errorMessageTemplate,
Object
<http://download.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true>
... errorMessageArgs)
The newer Guava version, checkArgument() all require boolean as first
parameter.
For Docker, using EC2 is a good idea. Is there a document or guidance for
it?
Thanks.
Ping
On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Such type exception could occur if a dependency (most likely Guava)
> version is not supported by the Spark version. What is the Spark and Guava
> versions? Use a more recent Guava version dependency in Maven pom.xml.
>
> Regarding Docker, a cloud platform instance such as EC2 could be used with
> Hyper-V support.
>
> On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Yes, I did use Maven. I even have the build pass successfully when setting
> Hadoop version to 3.2. Please see my response to Sean's email.
>
> Unfortunately, I only have Docker Toolbox as my Windows doesn't have
> Microsoft Hyper-V. So I want to avoid using Docker to do major work if
> possible.
>
> Thanks!
>
> Ping
>
>
> On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Several alternatives are available:
>
> - Use Maven to build Spark on Windows.
> http://spark.apache.org/docs/latest/building-spark.html#apache-maven
>
> - Use Docker image for CDH on Windows
> Docker Hub <https://hub.docker.com/u/cloudera>
>
> Docker Hub
>
> <https://hub.docker.com/u/cloudera>
>
>
>
>
> On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
For Spark, I am using master branch and just have code updated yesterday.
For Guava, I actually deleted my old versions from the local Maven repo.
The build process of Spark automatically downloaded a few versions. The
oldest version is 14.0.1.
But even in 14.0,1 (
https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html)
Preconditions already requires boolean as first parameter.
static void *checkArgument
<https://guava.dev/releases/14.0.1/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,
java.lang.String, java.lang.Object...)>*(boolean expression, String
<http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true>
errorMessageTemplate,
Object
<http://download.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true>
... errorMessageArgs)
The newer Guava version, checkArgument() all require boolean as first
parameter.
For Docker, using EC2 is a good idea. Is there a document or guidance for
it?
Thanks.
Ping
On Thu, Dec 5, 2019 at 3:30 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Such type exception could occur if a dependency (most likely Guava)
> version is not supported by the Spark version. What is the Spark and Guava
> versions? Use a more recent Guava version dependency in Maven pom.xml.
>
> Regarding Docker, a cloud platform instance such as EC2 could be used with
> Hyper-V support.
>
> On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <
> pingpinganan@gmail.com> wrote:
>
>
> Hi Deepak,
>
> Yes, I did use Maven. I even have the build pass successfully when setting
> Hadoop version to 3.2. Please see my response to Sean's email.
>
> Unfortunately, I only have Docker Toolbox as my Windows doesn't have
> Microsoft Hyper-V. So I want to avoid using Docker to do major work if
> possible.
>
> Thanks!
>
> Ping
>
>
> On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
>
> Several alternatives are available:
>
> - Use Maven to build Spark on Windows.
> http://spark.apache.org/docs/latest/building-spark.html#apache-maven
>
> - Use Docker image for CDH on Windows
> Docker Hub <https://hub.docker.com/u/cloudera>
>
> Docker Hub
>
> <https://hub.docker.com/u/cloudera>
>
>
>
>
> On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Such type exception could occur if a dependency (most likely Guava) version is not supported by the Spark version. What is the Spark and Guava versions? Use a more recent Guava version dependency in Maven pom.xml.
Regarding Docker, a cloud platform instance such as EC2 could be used with Hyper-V support.
On Thursday, December 5, 2019, 10:51:59 PM UTC, Ping Liu <pi...@gmail.com> wrote:
Hi Deepak,
Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email.
Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if possible.
Thanks!
Ping
On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
Several alternatives are available:
- Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven
- Use Docker image for CDH on WindowsDocker Hub
|
|
| |
Docker Hub
|
|
|
On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
Yes, I did use Maven. I even have the build pass successfully when setting
Hadoop version to 3.2. Please see my response to Sean's email.
Unfortunately, I only have Docker Toolbox as my Windows doesn't have
Microsoft Hyper-V. So I want to avoid using Docker to do major work if
possible.
Thanks!
Ping
On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Several alternatives are available:
>
> - Use Maven to build Spark on Windows.
> http://spark.apache.org/docs/latest/building-spark.html#apache-maven
>
> - Use Docker image for CDH on Windows
> Docker Hub <https://hub.docker.com/u/cloudera>
>
> Docker Hub
>
> <https://hub.docker.com/u/cloudera>
>
>
>
>
> On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Deepak,
Yes, I did use Maven. I even have the build pass successfully when setting
Hadoop version to 3.2. Please see my response to Sean's email.
Unfortunately, I only have Docker Toolbox as my Windows doesn't have
Microsoft Hyper-V. So I want to avoid using Docker to do major work if
possible.
Thanks!
Ping
On Thu, Dec 5, 2019 at 2:24 PM Deepak Vohra <dv...@yahoo.com> wrote:
> Several alternatives are available:
>
> - Use Maven to build Spark on Windows.
> http://spark.apache.org/docs/latest/building-spark.html#apache-maven
>
> - Use Docker image for CDH on Windows
> Docker Hub <https://hub.docker.com/u/cloudera>
>
> Docker Hub
>
> <https://hub.docker.com/u/cloudera>
>
>
>
>
> On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <
> srowen@gmail.com> wrote:
>
>
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Deepak Vohra <dv...@yahoo.com.INVALID>.
Several alternatives are available:
- Use Maven to build Spark on Windows. http://spark.apache.org/docs/latest/building-spark.html#apache-maven
- Use Docker image for CDH on WindowsDocker Hub
|
|
| |
Docker Hub
|
|
|
On Thursday, December 5, 2019, 09:33:43 p.m. UTC, Sean Owen <sr...@gmail.com> wrote:
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Ping Liu <pi...@gmail.com>.
Hi Sean,
Thanks for your response!
Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to
Spark home directory and ran mvn from there. Following is my build and
running result. The source code was just updated yesterday. I guess the
POM should specify newer Guava library somehow.
Thanks Sean.
Ping
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [
14.794 s]
[INFO] Spark Project Tags ................................. SUCCESS [
18.233 s]
[INFO] Spark Project Sketch ............................... SUCCESS [
20.077 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
7.846 s]
[INFO] Spark Project Networking ........................... SUCCESS [
14.906 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
6.267 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
31.710 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
10.227 s]
[INFO] Spark Project Core ................................. SUCCESS [08:03
min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:51
min]
[INFO] Spark Project GraphX ............................... SUCCESS [02:20
min]
[INFO] Spark Project Streaming ............................ SUCCESS [03:16
min]
[INFO] Spark Project Catalyst ............................. SUCCESS [08:45
min]
[INFO] Spark Project SQL .................................. SUCCESS [12:12
min]
[INFO] Spark Project ML Library ........................... SUCCESS [
16:28 h]
[INFO] Spark Project Tools ................................ SUCCESS [
23.602 s]
[INFO] Spark Project Hive ................................. SUCCESS [07:50
min]
[INFO] Spark Project Graph API ............................ SUCCESS [
8.734 s]
[INFO] Spark Project Cypher ............................... SUCCESS [
12.420 s]
[INFO] Spark Project Graph ................................ SUCCESS [
10.186 s]
[INFO] Spark Project REPL ................................. SUCCESS [01:03
min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [01:19
min]
[INFO] Spark Project YARN ................................. SUCCESS [02:19
min]
[INFO] Spark Project Assembly ............................. SUCCESS [
18.912 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [
57.925 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:20
min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [02:26
min]
[INFO] Spark Project Examples ............................. SUCCESS [02:00
min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [
28.354 s]
[INFO] Spark Avro ......................................... SUCCESS [01:44
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 17:30 h
[INFO] Finished at: 2019-12-05T12:20:01-08:00
[INFO]
------------------------------------------------------------------------
D:\apache\spark>cd bin
D:\apache\spark\bin>ls
beeline load-spark-env.cmd run-example spark-shell
spark-sql2.cmd sparkR.cmd
beeline.cmd load-spark-env.sh run-example.cmd spark-shell.cmd
spark-submit sparkR2.cmd
docker-image-tool.sh pyspark spark-class
spark-shell2.cmd spark-submit.cmd
find-spark-home pyspark.cmd spark-class.cmd spark-sql
spark-submit2.cmd
find-spark-home.cmd pyspark2.cmd spark-class2.cmd spark-sql.cmd
sparkR
D:\apache\spark\bin>spark-shell
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
at
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at
org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
Source)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
D:\apache\spark\bin>
On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sr...@gmail.com> wrote:
> What was the build error? you didn't say. Are you sure it succeeded?
> Try running from the Spark home dir, not bin.
> I know we do run Windows tests and it appears to pass tests, etc.
>
> On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
> >
> > Hello,
> >
> > I understand Spark is preferably built on Linux. But I have a Windows
> machine with a slow Virtual Box for Linux. So I wish I am able to build
> and run Spark code on Windows environment.
> >
> > Unfortunately,
> >
> > # Apache Hadoop 2.6.X
> > ./build/mvn -Pyarn -DskipTests clean package
> >
> > # Apache Hadoop 2.7.X and later
> > ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package
> >
> >
> > Both are listed on
> http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
> >
> > But neither works for me (I stay directly under spark root directory and
> run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean
> package"
> >
> > and
> >
> > Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests
> clean package"
> >
> > Now build works. But when I run spark-shell. I got the following error.
> >
> > D:\apache\spark\bin>spark-shell
> > Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> > at
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> > at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> > at
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> > at
> org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown
> Source)
> > at scala.Option.getOrElse(Option.scala:189)
> > at
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> > at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> > at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> > at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> > at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> > at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> > at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> >
> > Has anyone experienced building and running Spark source code
> successfully on Windows? Could you please share your experience?
> >
> > Thanks a lot!
> >
> > Ping
> >
>
Re: Is it feasible to build and run Spark on Windows?
Posted by Sean Owen <sr...@gmail.com>.
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
Re: Is it feasible to build and run Spark on Windows?
Posted by Sean Owen <sr...@gmail.com>.
What was the build error? you didn't say. Are you sure it succeeded?
Try running from the Spark home dir, not bin.
I know we do run Windows tests and it appears to pass tests, etc.
On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pi...@gmail.com> wrote:
>
> Hello,
>
> I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment.
>
> Unfortunately,
>
> # Apache Hadoop 2.6.X
> ./build/mvn -Pyarn -DskipTests clean package
>
> # Apache Hadoop 2.7.X and later
> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
>
>
> Both are listed on http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn
>
> But neither works for me (I stay directly under spark root directory and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package"
>
> and
>
> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 -DskipTests clean package"
>
> Now build works. But when I run spark-shell. I got the following error.
>
> D:\apache\spark\bin>spark-shell
> Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456)
> at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427)
> at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown Source)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Has anyone experienced building and running Spark source code successfully on Windows? Could you please share your experience?
>
> Thanks a lot!
>
> Ping
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org