You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arun Ahuja <aa...@gmail.com> on 2014/11/18 16:21:17 UTC

Nightly releases

Are nightly releases posted anywhere?  There are quite a few vital bugfixes
and performance improvements being commited to Spark and using the latest
commits is useful (or even necessary for some jobs).

Is there a place to post them, it doesn't seem like it would diffcult to
run make-dist nightly and place it somwhere?

Is is possible extract this from Jenkins bulds?

Thanks,
Arun
 ​

Re: Extracting values from a Collecion

Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
I am sorry the last line in the code is 
file1Rdd.join(file2RddGrp.mapValues(names => names.toSet)).collect().foreach(println)
so 
My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file1Rdd.join(file2RddGrp.mapValues(names => names.toSet)).collect().foreach(println)
Result=======(4,(ringo,Set(With a Little Help From My Friends, Octopus's Garden)))(2,(john,Set(Julia, Nowhere Man)))(3,(george,Set(While My Guitar Gently Weeps, Norwegian Wood)))(1,(paul,Set(Yesterday, Michelle)))
Again the question is how do I extract values from the Set ?
thanks
sanjay      From: Sanjay Subramanian <sa...@yahoo.com.INVALID>
 To: Arun Ahuja <aa...@gmail.com>; Andrew Ash <an...@andrewash.com> 
Cc: user <us...@spark.apache.org> 
 Sent: Friday, November 21, 2014 10:41 AM
 Subject: Extracting values from a Collecion
   
hey guys
names.txt========= 1,paul2,john3,george4,ringo 

songs.txt========= 1,Yesterday2,Julia3,While My Guitar Gently Weeps4,With a Little Help From My Friends1,Michelle2,Nowhere Man3,Norwegian Wood4,Octopus's Garden
What I want to do is real simple 
Desired Output ==============(4,(With a Little Help From My Friends, Octopus's Garden))(2,(Julia, Nowhere Man))(3,(While My Guitar Gently Weeps, Norwegian Wood))(1,(Yesterday, Michelle))

My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file2Rdd.groupByKey().mapValues(names => names.toSet).collect().foreach(println)

Result=======(4,Set(With a Little Help From My Friends, Octopus's Garden))(2,Set(Julia, Nowhere Man))(3,Set(While My Guitar Gently Weeps, Norwegian Wood))(1,Set(Yesterday, Michelle))

How can I extract values from the Set ?


Thanks
sanjay


  

Re: Extracting values from a Collecion

Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
I could not iterate thru the set but changed the code to get what I was looking for(Not elegant but gets me going)
package org.medicalsidefx.common.utils

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._

import scala.collection.mutable.ArrayBuffer

/**
 * Created by sansub01 on 11/19/14.
 */
object TwoWayJoin2 {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: TwoWayJoinCount <file1>   <file2>")
      System.exit(12)
    }

    val sconf = new SparkConf().setMaster("local").setAppName("MedicalSideFx-TwoWayJoin")

    val sc = new SparkContext(sconf)

    val file1 = args(0)
    val file2 = args(1)

    val file1Rdd = sc.textFile(file1).map(x => (x.split(",")(0), x.split(",")(1)))
    val file2Rdd = sc.textFile(file2).map(x => (x.split(",")(0), x.split(",")(1))).reduceByKey((v1,v2) => v1+"|"+v2)

    file1Rdd.collect().foreach(println)
    file2Rdd.collect().foreach(println)

    file1Rdd.join(file2Rdd).collect().foreach( e => println(e.toString.replace("(","").replace(")","")))

  }
}

      From: Jey Kottalam <je...@cs.berkeley.edu>
 To: Sanjay Subramanian <sa...@yahoo.com> 
Cc: Arun Ahuja <aa...@gmail.com>; Andrew Ash <an...@andrewash.com>; user <us...@spark.apache.org> 
 Sent: Friday, November 21, 2014 10:07 PM
 Subject: Extracting values from a Collecion
   
Hi Sanjay,

These are instances of the standard Scala collection type "Set", and its documentation can be found by googling the phrase "scala set".

Hope that helps,
-Jey



On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian <sa...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>



  

Re: Extracting values from a Collecion

Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
Thanks Jeyregardssanjay
      From: Jey Kottalam <je...@cs.berkeley.edu>
 To: Sanjay Subramanian <sa...@yahoo.com> 
Cc: Arun Ahuja <aa...@gmail.com>; Andrew Ash <an...@andrewash.com>; user <us...@spark.apache.org> 
 Sent: Friday, November 21, 2014 10:07 PM
 Subject: Extracting values from a Collecion
   
Hi Sanjay,

These are instances of the standard Scala collection type "Set", and its documentation can be found by googling the phrase "scala set".

Hope that helps,
-Jey



On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian <sa...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>



  

Extracting values from a Collecion

Posted by Jey Kottalam <je...@cs.berkeley.edu>.
Hi Sanjay,

These are instances of the standard Scala collection type "Set", and its
documentation can be found by googling the phrase "scala set".

Hope that helps,
-Jey

On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian
<sa...@yahoo.com.invalid> wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>

Extracting values from a Collecion

Posted by Sanjay Subramanian <sa...@yahoo.com.INVALID>.
hey guys
names.txt========= 1,paul2,john3,george4,ringo 

songs.txt========= 1,Yesterday2,Julia3,While My Guitar Gently Weeps4,With a Little Help From My Friends1,Michelle2,Nowhere Man3,Norwegian Wood4,Octopus's Garden
What I want to do is real simple 
Desired Output ==============(4,(With a Little Help From My Friends, Octopus's Garden))(2,(Julia, Nowhere Man))(3,(While My Guitar Gently Weeps, Norwegian Wood))(1,(Yesterday, Michelle))

My Code=======val file1Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2Rdd = sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x => (x.split(",")(0), x.split(",")(1)))val file2RddGrp = file2Rdd.groupByKey()file2Rdd.groupByKey().mapValues(names => names.toSet).collect().foreach(println)

Result=======(4,Set(With a Little Help From My Friends, Octopus's Garden))(2,Set(Julia, Nowhere Man))(3,Set(While My Guitar Gently Weeps, Norwegian Wood))(1,Set(Yesterday, Michelle))

How can I extract values from the Set ?
Thanks
sanjay

Re: Nightly releases

Posted by Arun Ahuja <aa...@gmail.com>.
Great - posted here https://issues.apache.org/jira/browse/SPARK-4542

On Fri, Nov 21, 2014 at 1:03 PM, Andrew Ash <an...@andrewash.com> wrote:

> Yes you should file a Jira and echo it out here so others can follow and
> comment on it.  Thanks Arun!
>
> On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja <aa...@gmail.com> wrote:
>
>> Great - what can we do to make this happen?  So should I file a JIRA to
>> track?
>>
>> Thanks,
>>
>> Arun
>>
>> On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash <an...@andrewash.com>
>> wrote:
>>
>>> I can see this being valuable for users wanting to live on the cutting
>>> edge without building CI infrastructure themselves, myself included.  I
>>> think Patrick's recent work on the build scripts for 1.2.0 will make
>>> delivering nightly builds to a public maven repo easier.
>>>
>>> On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>>
>>>> Of course we can run this as well to get the lastest, but the build is
>>>> fairly long and this seems like a resource many would need.
>>>>
>>>> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are nightly releases posted anywhere?  There are quite a few vital
>>>>> bugfixes and performance improvements being commited to Spark and using the
>>>>> latest commits is useful (or even necessary for some jobs).
>>>>>
>>>>> Is there a place to post them, it doesn't seem like it would diffcult
>>>>> to run make-dist nightly and place it somwhere?
>>>>>
>>>>> Is is possible extract this from Jenkins bulds?
>>>>>
>>>>> Thanks,
>>>>> Arun
>>>>>  ​
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Nightly releases

Posted by Andrew Ash <an...@andrewash.com>.
Yes you should file a Jira and echo it out here so others can follow and
comment on it.  Thanks Arun!

On Fri, Nov 21, 2014 at 12:02 PM, Arun Ahuja <aa...@gmail.com> wrote:

> Great - what can we do to make this happen?  So should I file a JIRA to
> track?
>
> Thanks,
>
> Arun
>
> On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash <an...@andrewash.com> wrote:
>
>> I can see this being valuable for users wanting to live on the cutting
>> edge without building CI infrastructure themselves, myself included.  I
>> think Patrick's recent work on the build scripts for 1.2.0 will make
>> delivering nightly builds to a public maven repo easier.
>>
>> On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>
>>> Of course we can run this as well to get the lastest, but the build is
>>> fairly long and this seems like a resource many would need.
>>>
>>> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>>
>>>> Are nightly releases posted anywhere?  There are quite a few vital
>>>> bugfixes and performance improvements being commited to Spark and using the
>>>> latest commits is useful (or even necessary for some jobs).
>>>>
>>>> Is there a place to post them, it doesn't seem like it would diffcult
>>>> to run make-dist nightly and place it somwhere?
>>>>
>>>> Is is possible extract this from Jenkins bulds?
>>>>
>>>> Thanks,
>>>> Arun
>>>>  ​
>>>>
>>>
>>>
>>
>

Re: Nightly releases

Posted by Arun Ahuja <aa...@gmail.com>.
Great - what can we do to make this happen?  So should I file a JIRA to
track?

Thanks,

Arun

On Tue, Nov 18, 2014 at 11:46 AM, Andrew Ash <an...@andrewash.com> wrote:

> I can see this being valuable for users wanting to live on the cutting
> edge without building CI infrastructure themselves, myself included.  I
> think Patrick's recent work on the build scripts for 1.2.0 will make
> delivering nightly builds to a public maven repo easier.
>
> On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:
>
>> Of course we can run this as well to get the lastest, but the build is
>> fairly long and this seems like a resource many would need.
>>
>> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
>>
>>> Are nightly releases posted anywhere?  There are quite a few vital
>>> bugfixes and performance improvements being commited to Spark and using the
>>> latest commits is useful (or even necessary for some jobs).
>>>
>>> Is there a place to post them, it doesn't seem like it would diffcult to
>>> run make-dist nightly and place it somwhere?
>>>
>>> Is is possible extract this from Jenkins bulds?
>>>
>>> Thanks,
>>> Arun
>>>  ​
>>>
>>
>>
>

Re: Nightly releases

Posted by Andrew Ash <an...@andrewash.com>.
I can see this being valuable for users wanting to live on the cutting edge
without building CI infrastructure themselves, myself included.  I think
Patrick's recent work on the build scripts for 1.2.0 will make delivering
nightly builds to a public maven repo easier.

On Tue, Nov 18, 2014 at 10:22 AM, Arun Ahuja <aa...@gmail.com> wrote:

> Of course we can run this as well to get the lastest, but the build is
> fairly long and this seems like a resource many would need.
>
> On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:
>
>> Are nightly releases posted anywhere?  There are quite a few vital
>> bugfixes and performance improvements being commited to Spark and using the
>> latest commits is useful (or even necessary for some jobs).
>>
>> Is there a place to post them, it doesn't seem like it would diffcult to
>> run make-dist nightly and place it somwhere?
>>
>> Is is possible extract this from Jenkins bulds?
>>
>> Thanks,
>> Arun
>>  ​
>>
>
>

Re: Nightly releases

Posted by Arun Ahuja <aa...@gmail.com>.
Of course we can run this as well to get the lastest, but the build is
fairly long and this seems like a resource many would need.

On Tue, Nov 18, 2014 at 10:21 AM, Arun Ahuja <aa...@gmail.com> wrote:

> Are nightly releases posted anywhere?  There are quite a few vital
> bugfixes and performance improvements being commited to Spark and using the
> latest commits is useful (or even necessary for some jobs).
>
> Is there a place to post them, it doesn't seem like it would diffcult to
> run make-dist nightly and place it somwhere?
>
> Is is possible extract this from Jenkins bulds?
>
> Thanks,
> Arun
>  ​
>