You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by GlennStrycker <gl...@gmail.com> on 2014/05/16 18:41:34 UTC
Scala examples for Spark do not work as written in documentation
On the webpage http://spark.apache.org/examples.html, there is an example
written as
val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
).reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
This does not execute in Spark, which gives me an error:
<console>:2: error: illegal start of simple expression
val x = Math.random()
^
If I rewrite the query slightly, adding in {}, it works:
val count = spark.parallelize(1 to 10000).map(i =>
{
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}
).reduce(_ + _)
println("Pi is roughly " + 4.0 * count / 10000.0)
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
Posted by Reynold Xin <rx...@databricks.com>.
Thanks for pointing it out. We should update the website to fix the code.
val count = spark.parallelize(1 to NUM_SAMPLES).map { i =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <gl...@gmail.com>wrote:
> On the webpage http://spark.apache.org/examples.html, there is an example
> written as
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
> val x = Math.random()
> val y = Math.random()
> if (x*x + y*y < 1) 1 else 0
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>
> This does not execute in Spark, which gives me an error:
> <console>:2: error: illegal start of simple expression
> val x = Math.random()
> ^
>
> If I rewrite the query slightly, adding in {}, it works:
>
> val count = spark.parallelize(1 to 10000).map(i =>
> {
> val x = Math.random()
> val y = Math.random()
> if (x*x + y*y < 1) 1 else 0
> }
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / 10000.0)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
Re: Scala examples for Spark do not work as written in documentation
Posted by Patrick Wendell <pw...@gmail.com>.
Those are pretty old - but I think the reason Matei did that was to
make it less confusing for brand new users. `spark` is actually a
valid identifier because it's just a variable name (val spark = new
SparkContext()) but I agree this could be confusing for users who want
to drop into the shell.
On Fri, Jun 20, 2014 at 12:04 PM, Will Benton <wi...@redhat.com> wrote:
> Hey, sorry to reanimate this thread, but just a quick question: why do the examples (on http://spark.apache.org/examples.html) use "spark" for the SparkContext reference? This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to change "spark" to "sc". (I noticed because this was a speedbump for a colleague who is trying out Spark.)
>
>
> thanks,
> wb
>
> ----- Original Message -----
>> From: "Andy Konwinski" <an...@gmail.com>
>> To: dev@spark.apache.org
>> Sent: Tuesday, May 20, 2014 4:06:33 PM
>> Subject: Re: Scala examples for Spark do not work as written in documentation
>>
>> I fixed the bug, but I kept the parameter "i" instead of "_" since that (1)
>> keeps it more parallel to the python and java versions which also use
>> functions with a named variable and (2) doesn't require readers to know
>> this particular use of the "_" syntax in Scala.
>>
>> Thanks for catching this Glenn.
>>
>> Andy
>>
>>
>> On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra
>> <ma...@clearstorydata.com>wrote:
>>
>> > Sorry, looks like an extra line got inserted in there. One more try:
>> >
>> > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>> > val x = Math.random()
>> > val y = Math.random()
>> > if (x*x + y*y < 1) 1 else 0
>> > }.reduce(_ + _)
>> >
>> >
>> >
>> > On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <mark@clearstorydata.com
>> > >wrote:
>> >
>> > > Actually, the better way to write the multi-line closure would be:
>> > >
>> > > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>> > >
>> > > val x = Math.random()
>> > > val y = Math.random()
>> > > if (x*x + y*y < 1) 1 else 0
>> > > }.reduce(_ + _)
>> > >
>> > >
>> > > On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <glenn.strycker@gmail.com
>> > >wrote:
>> > >
>> > >> On the webpage http://spark.apache.org/examples.html, there is an
>> > example
>> > >> written as
>> > >>
>> > >> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>> > >> val x = Math.random()
>> > >> val y = Math.random()
>> > >> if (x*x + y*y < 1) 1 else 0
>> > >> ).reduce(_ + _)
>> > >> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>> > >>
>> > >> This does not execute in Spark, which gives me an error:
>> > >> <console>:2: error: illegal start of simple expression
>> > >> val x = Math.random()
>> > >> ^
>> > >>
>> > >> If I rewrite the query slightly, adding in {}, it works:
>> > >>
>> > >> val count = spark.parallelize(1 to 10000).map(i =>
>> > >> {
>> > >> val x = Math.random()
>> > >> val y = Math.random()
>> > >> if (x*x + y*y < 1) 1 else 0
>> > >> }
>> > >> ).reduce(_ + _)
>> > >> println("Pi is roughly " + 4.0 * count / 10000.0)
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> View this message in context:
>> > >>
>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
>> > >> Sent from the Apache Spark Developers List mailing list archive at
>> > >> Nabble.com.
>> > >>
>> > >
>> > >
>> >
>>
Re: Scala examples for Spark do not work as written in
documentation
Posted by Will Benton <wi...@redhat.com>.
Hey, sorry to reanimate this thread, but just a quick question: why do the examples (on http://spark.apache.org/examples.html) use "spark" for the SparkContext reference? This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to change "spark" to "sc". (I noticed because this was a speedbump for a colleague who is trying out Spark.)
thanks,
wb
----- Original Message -----
> From: "Andy Konwinski" <an...@gmail.com>
> To: dev@spark.apache.org
> Sent: Tuesday, May 20, 2014 4:06:33 PM
> Subject: Re: Scala examples for Spark do not work as written in documentation
>
> I fixed the bug, but I kept the parameter "i" instead of "_" since that (1)
> keeps it more parallel to the python and java versions which also use
> functions with a named variable and (2) doesn't require readers to know
> this particular use of the "_" syntax in Scala.
>
> Thanks for catching this Glenn.
>
> Andy
>
>
> On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra
> <ma...@clearstorydata.com>wrote:
>
> > Sorry, looks like an extra line got inserted in there. One more try:
> >
> > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> > val x = Math.random()
> > val y = Math.random()
> > if (x*x + y*y < 1) 1 else 0
> > }.reduce(_ + _)
> >
> >
> >
> > On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <mark@clearstorydata.com
> > >wrote:
> >
> > > Actually, the better way to write the multi-line closure would be:
> > >
> > > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> > >
> > > val x = Math.random()
> > > val y = Math.random()
> > > if (x*x + y*y < 1) 1 else 0
> > > }.reduce(_ + _)
> > >
> > >
> > > On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <glenn.strycker@gmail.com
> > >wrote:
> > >
> > >> On the webpage http://spark.apache.org/examples.html, there is an
> > example
> > >> written as
> > >>
> > >> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
> > >> val x = Math.random()
> > >> val y = Math.random()
> > >> if (x*x + y*y < 1) 1 else 0
> > >> ).reduce(_ + _)
> > >> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
> > >>
> > >> This does not execute in Spark, which gives me an error:
> > >> <console>:2: error: illegal start of simple expression
> > >> val x = Math.random()
> > >> ^
> > >>
> > >> If I rewrite the query slightly, adding in {}, it works:
> > >>
> > >> val count = spark.parallelize(1 to 10000).map(i =>
> > >> {
> > >> val x = Math.random()
> > >> val y = Math.random()
> > >> if (x*x + y*y < 1) 1 else 0
> > >> }
> > >> ).reduce(_ + _)
> > >> println("Pi is roughly " + 4.0 * count / 10000.0)
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >>
> > http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> > >> Sent from the Apache Spark Developers List mailing list archive at
> > >> Nabble.com.
> > >>
> > >
> > >
> >
>
Re: Scala examples for Spark do not work as written in documentation
Posted by Andy Konwinski <an...@gmail.com>.
I fixed the bug, but I kept the parameter "i" instead of "_" since that (1)
keeps it more parallel to the python and java versions which also use
functions with a named variable and (2) doesn't require readers to know
this particular use of the "_" syntax in Scala.
Thanks for catching this Glenn.
Andy
On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra <ma...@clearstorydata.com>wrote:
> Sorry, looks like an extra line got inserted in there. One more try:
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> val x = Math.random()
> val y = Math.random()
> if (x*x + y*y < 1) 1 else 0
> }.reduce(_ + _)
>
>
>
> On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <mark@clearstorydata.com
> >wrote:
>
> > Actually, the better way to write the multi-line closure would be:
> >
> > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
> >
> > val x = Math.random()
> > val y = Math.random()
> > if (x*x + y*y < 1) 1 else 0
> > }.reduce(_ + _)
> >
> >
> > On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <glenn.strycker@gmail.com
> >wrote:
> >
> >> On the webpage http://spark.apache.org/examples.html, there is an
> example
> >> written as
> >>
> >> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
> >> val x = Math.random()
> >> val y = Math.random()
> >> if (x*x + y*y < 1) 1 else 0
> >> ).reduce(_ + _)
> >> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
> >>
> >> This does not execute in Spark, which gives me an error:
> >> <console>:2: error: illegal start of simple expression
> >> val x = Math.random()
> >> ^
> >>
> >> If I rewrite the query slightly, adding in {}, it works:
> >>
> >> val count = spark.parallelize(1 to 10000).map(i =>
> >> {
> >> val x = Math.random()
> >> val y = Math.random()
> >> if (x*x + y*y < 1) 1 else 0
> >> }
> >> ).reduce(_ + _)
> >> println("Pi is roughly " + 4.0 * count / 10000.0)
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
> >>
> >
> >
>
Re: Scala examples for Spark do not work as written in documentation
Posted by Mark Hamstra <ma...@clearstorydata.com>.
Sorry, looks like an extra line got inserted in there. One more try:
val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra <ma...@clearstorydata.com>wrote:
> Actually, the better way to write the multi-line closure would be:
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
>
> val x = Math.random()
> val y = Math.random()
> if (x*x + y*y < 1) 1 else 0
> }.reduce(_ + _)
>
>
> On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <gl...@gmail.com>wrote:
>
>> On the webpage http://spark.apache.org/examples.html, there is an example
>> written as
>>
>> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
>> val x = Math.random()
>> val y = Math.random()
>> if (x*x + y*y < 1) 1 else 0
>> ).reduce(_ + _)
>> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>>
>> This does not execute in Spark, which gives me an error:
>> <console>:2: error: illegal start of simple expression
>> val x = Math.random()
>> ^
>>
>> If I rewrite the query slightly, adding in {}, it works:
>>
>> val count = spark.parallelize(1 to 10000).map(i =>
>> {
>> val x = Math.random()
>> val y = Math.random()
>> if (x*x + y*y < 1) 1 else 0
>> }
>> ).reduce(_ + _)
>> println("Pi is roughly " + 4.0 * count / 10000.0)
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>
>
Re: Scala examples for Spark do not work as written in
documentation
Posted by GlennStrycker <gl...@gmail.com>.
Why does the reduce function only work on sums of keys of the same type and
does not support other functional forms?
I am having trouble in another example where instead of 1s and 0s, the
output of the map function is something like A=(1,2) and B=(3,4). I need a
reduce function that can return something complicated based on reduce( (A,B)
=> (arbitrary fcn1 of A and B, arbitrary fcn2 of A and B) ), but I am only
getting reduce( (A,B) => (arbitrary fcn1 of A, arbitrary fcn2 of A) ).
See
http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-td6606.html
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593p6607.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
Posted by Mark Hamstra <ma...@clearstorydata.com>.
Actually, the better way to write the multi-line closure would be:
val count = spark.parallelize(1 to NUM_SAMPLES).map { _ =>
val x = Math.random()
val y = Math.random()
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
On Fri, May 16, 2014 at 9:41 AM, GlennStrycker <gl...@gmail.com>wrote:
> On the webpage http://spark.apache.org/examples.html, there is an example
> written as
>
> val count = spark.parallelize(1 to NUM_SAMPLES).map(i =>
> val x = Math.random()
> val y = Math.random()
> if (x*x + y*y < 1) 1 else 0
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)
>
> This does not execute in Spark, which gives me an error:
> <console>:2: error: illegal start of simple expression
> val x = Math.random()
> ^
>
> If I rewrite the query slightly, adding in {}, it works:
>
> val count = spark.parallelize(1 to 10000).map(i =>
> {
> val x = Math.random()
> val y = Math.random()
> if (x*x + y*y < 1) 1 else 0
> }
> ).reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / 10000.0)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>