You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arun Ahuja <aa...@gmail.com> on 2014/11/18 16:21:17 UTC
Nightly releases
Are nightly releases posted anywhere? There are quite a few vital bugfixes
and performance improvements being commited to Spark and using the latest
commits is useful (or even necessary for some jobs).
Is there a place to post them, it doesn't seem like it would diffcult to
run make-dist nightly and place it somwhere?
Is is possible extract this from Jenkins bulds?
Thanks,
Arun
Re: Extracting values from a Collecion
Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
I am sorry the last line in the code is
file1Rdd.join(file2RddGrp.mapValues(names => names.toSet)).collect().foreach(println)
so
My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file1Rdd.join(file2RddGrp.mapValues(names => names.toSet)).collect().foreach(println)
Result=======(4,(ringo,Set(With a Little Help From My Friends, Octopus's Garden)))(2,(john,Set(Julia, Nowhere Man)))(3,(george,Set(While My Guitar Gently Weeps, Norwegian Wood)))(1,(paul,Set(Yesterday, Michelle)))
Again the question is how do I extract values from the Set ?
thanks
sanjay From: Sanjay Subramanian <sa...@yahoo.com.INVALID>
To: Arun Ahuja <aa...@gmail.com>; Andrew Ash <an...@andrewash.com>
Cc: user <us...@spark.apache.org>
Sent: Friday, November 21, 2014 10:41 AM
Subject: Extracting values from a Collecion
hey guys
names.txt========= 1,paul2,john3,george4,ringo
songs.txt========= 1,Yesterday2,Julia3,While My Guitar Gently Weeps4,With a Little Help From My Friends1,Michelle2,Nowhere Man3,Norwegian Wood4,Octopus's Garden
What I want to do is real simple
Desired Output ==============(4,(With a Little Help From My Friends, Octopus's Garden))(2,(Julia, Nowhere Man))(3,(While My Guitar Gently Weeps, Norwegian Wood))(1,(Yesterday, Michelle))
My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file2Rdd.groupByKey().mapValues(names => names.toSet).collect().foreach(println)
Result=======(4,Set(With a Little Help From My Friends, Octopus's Garden))(2,Set(Julia, Nowhere Man))(3,Set(While My Guitar Gently Weeps, Norwegian Wood))(1,Set(Yesterday, Michelle))
How can I extract values from the Set ?
Thanks
sanjay
Re: Extracting values from a Collecion
Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
I could not iterate thru the set but changed the code to get what I was looking for(Not elegant but gets me going)
package org.medicalsidefx.common.utils
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import scala.collection.mutable.ArrayBuffer
/**
* Created by sansub01 on 11/19/14.
*/
object TwoWayJoin2 {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println("Usage: TwoWayJoinCount <file1> <file2>")
System.exit(12)
}
val sconf = new SparkConf().setMaster("local").setAppName("MedicalSideFx-TwoWayJoin")
val sc = new SparkContext(sconf)
val file1 = args(0)
val file2 = args(1)
val file1Rdd = sc.textFile(file1).map(x => (x.split(",")(0), x.split(",")(1)))
val file2Rdd = sc.textFile(file2).map(x => (x.split(",")(0), x.split(",")(1))).reduceByKey((v1,v2) => v1+"|"+v2)
file1Rdd.collect().foreach(println)
file2Rdd.collect().foreach(println)
file1Rdd.join(file2Rdd).collect().foreach( e => println(e.toString.replace("(","").replace(")","")))
}
}
From: Jey Kottalam <je...@cs.berkeley.edu>
To: Sanjay Subramanian <sa...@yahoo.com>
Cc: Arun Ahuja <aa...@gmail.com>; Andrew Ash <an...@andrewash.com>; user <us...@spark.apache.org>
Sent: Friday, November 21, 2014 10:07 PM
Subject: Extracting values from a Collecion
Hi Sanjay,
These are instances of the standard Scala collection type "Set", and its documentation can be found by googling the phrase "scala set".
Hope that helps,
-Jey
On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian <sa...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>
Re: Extracting values from a Collecion
Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
Thanks Jeyregardssanjay
From: Jey Kottalam <je...@cs.berkeley.edu>
To: Sanjay Subramanian <sa...@yahoo.com>
Cc: Arun Ahuja <aa...@gmail.com>; Andrew Ash <an...@andrewash.com>; user <us...@spark.apache.org>
Sent: Friday, November 21, 2014 10:07 PM
Subject: Extracting values from a Collecion
Hi Sanjay,
These are instances of the standard Scala collection type "Set", and its documentation can be found by googling the phrase "scala set".
Hope that helps,
-Jey
On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian <sa...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>
Extracting values from a Collecion
Posted by Jey Kottalam <je...@cs.berkeley.edu>.
Hi Sanjay,
These are instances of the standard Scala collection type "Set", and its
documentation can be found by googling the phrase "scala set".
Hope that helps,
-Jey
On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian
<sa...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>
Extracting values from a Collecion
Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
hey guys
names.txt========= 1,paul2,john3,george4,ringo
songs.txt========= 1,Yesterday2,Julia3,While My Guitar Gently Weeps4,With a Little Help From My Friends1,Michelle2,Nowhere Man3,Norwegian Wood4,Octopus's Garden
What I want to do is real simple
Desired Output ==============(4,(With a Little Help From My Friends, Octopus's Garden))(2,(Julia, Nowhere Man))(3,(While My Guitar Gently Weeps, Norwegian Wood))(1,(Yesterday, Michelle))
My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file2Rdd.groupByKey().mapValues(names => names.toSet).collect().foreach(println)
Result=======(4,Set(With a Little Help From My Friends, Octopus's Garden))(2,Set(Julia, Nowhere Man))(3,Set(While My Guitar Gently Weeps, Norwegian Wood))(1,Set(Yesterday, Michelle))
How can I extract values from the Set ?
Thanks
sanjay
Re: Nightly releases
Posted by Arun Ahuja <aa...@gmail.com>.
Great - posted here https://issues.apache.org/jira/browse/SPARK-4542
On Fri, Nov 21, 2014 at 1:03 PM, Andrew Ash <an...@andrewash.com> wrote:
> Yes you should file a Jira and echo it out here so others can follow and
> comment on it. Thanks Arun!
>
> On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja <aa...@gmail.com> wrote:
>
>> Great - what can we do to make this happen? So should I file a JIRA to
>> track?
>>
>> Thanks,
>>
>> Arun
>>
>> On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash <an...@andrewash.com>
>> wrote:
>>
>>> I can see this being valuable for users wanting to live on the cutting
>>> edge without building CI infrastructure themselves, myself included. I
>>> think Patrick's recent work on the build scripts for 1.2.0 will make
>>> delivering nightly builds to a public maven repo easier.
>>>
>>> On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>>
>>>> Of course we can run this as well to get the lastest, but the build is
>>>> fairly long and this seems like a resource many would need.
>>>>
>>>> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are nightly releases posted anywhere? There are quite a few vital
>>>>> bugfixes and performance improvements being commited to Spark and using the
>>>>> latest commits is useful (or even necessary for some jobs).
>>>>>
>>>>> Is there a place to post them, it doesn't seem like it would diffcult
>>>>> to run make-dist nightly and place it somwhere?
>>>>>
>>>>> Is is possible extract this from Jenkins bulds?
>>>>>
>>>>> Thanks,
>>>>> Arun
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Nightly releases
Posted by Andrew Ash <an...@andrewash.com>.
Yes you should file a Jira and echo it out here so others can follow and
comment on it. Thanks Arun!
On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja <aa...@gmail.com> wrote:
> Great - what can we do to make this happen? So should I file a JIRA to
> track?
>
> Thanks,
>
> Arun
>
> On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash <an...@andrewash.com> wrote:
>
>> I can see this being valuable for users wanting to live on the cutting
>> edge without building CI infrastructure themselves, myself included. I
>> think Patrick's recent work on the build scripts for 1.2.0 will make
>> delivering nightly builds to a public maven repo easier.
>>
>> On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>
>>> Of course we can run this as well to get the lastest, but the build is
>>> fairly long and this seems like a resource many would need.
>>>
>>> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>>
>>>> Are nightly releases posted anywhere? There are quite a few vital
>>>> bugfixes and performance improvements being commited to Spark and using the
>>>> latest commits is useful (or even necessary for some jobs).
>>>>
>>>> Is there a place to post them, it doesn't seem like it would diffcult
>>>> to run make-dist nightly and place it somwhere?
>>>>
>>>> Is is possible extract this from Jenkins bulds?
>>>>
>>>> Thanks,
>>>> Arun
>>>>
>>>>
>>>
>>>
>>
>
Re: Nightly releases
Posted by Arun Ahuja <aa...@gmail.com>.
Great - what can we do to make this happen? So should I file a JIRA to
track?
Thanks,
Arun
On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash <an...@andrewash.com> wrote:
> I can see this being valuable for users wanting to live on the cutting
> edge without building CI infrastructure themselves, myself included. I
> think Patrick's recent work on the build scripts for 1.2.0 will make
> delivering nightly builds to a public maven repo easier.
>
> On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
>
>> Of course we can run this as well to get the lastest, but the build is
>> fairly long and this seems like a resource many would need.
>>
>> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>
>>> Are nightly releases posted anywhere? There are quite a few vital
>>> bugfixes and performance improvements being commited to Spark and using the
>>> latest commits is useful (or even necessary for some jobs).
>>>
>>> Is there a place to post them, it doesn't seem like it would diffcult to
>>> run make-dist nightly and place it somwhere?
>>>
>>> Is is possible extract this from Jenkins bulds?
>>>
>>> Thanks,
>>> Arun
>>>
>>>
>>
>>
>
Re: Nightly releases
Posted by Andrew Ash <an...@andrewash.com>.
I can see this being valuable for users wanting to live on the cutting edge
without building CI infrastructure themselves, myself included. I think
Patrick's recent work on the build scripts for 1.2.0 will make delivering
nightly builds to a public maven repo easier.
On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
> Of course we can run this as well to get the lastest, but the build is
> fairly long and this seems like a resource many would need.
>
> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
>
>> Are nightly releases posted anywhere? There are quite a few vital
>> bugfixes and performance improvements being commited to Spark and using the
>> latest commits is useful (or even necessary for some jobs).
>>
>> Is there a place to post them, it doesn't seem like it would diffcult to
>> run make-dist nightly and place it somwhere?
>>
>> Is is possible extract this from Jenkins bulds?
>>
>> Thanks,
>> Arun
>>
>>
>
>
Re: Nightly releases
Posted by Arun Ahuja <aa...@gmail.com>.
Of course we can run this as well to get the lastest, but the build is
fairly long and this seems like a resource many would need.
On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
> Are nightly releases posted anywhere? There are quite a few vital
> bugfixes and performance improvements being commited to Spark and using the
> latest commits is useful (or even necessary for some jobs).
>
> Is there a place to post them, it doesn't seem like it would diffcult to
> run make-dist nightly and place it somwhere?
>
> Is is possible extract this from Jenkins bulds?
>
> Thanks,
> Arun
>
>