You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by satish chandra j <js...@gmail.com> on 2015/08/20 12:05:52 UTC

Transformation not happening for reduceByKey or GroupByKey

HI All,
I have data in RDD as mentioned below:

RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
Values for each key

Code:
RDD.reduceByKey((x,y) => x+y)
RDD.take(3)

Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
<console>:73
res:Array[(Int,Int)] = Array()

Command as mentioned

dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>


Please let me know what is missing in my code, as my resultant Array is
empty



Regards,
Satish

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI All,
Could anybody let me know what is that i missing here, it should work as
its a basic transformation

Please let me know if any additional information required

Regards,
Satish

On Thu, Aug 20, 2015 at 3:35 PM, satish chandra j <js...@gmail.com>
wrote:

> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI All,

Please find fix info for users who are following the mail chain of this
issue and the respective solution below:

*reduceByKey: Non working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new SparkContext(conf)

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array() is empty

*reduceByKey: Working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new
SparkContext(conf).set("spark.driver.allowMultipleContexts","true")

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array((0,3),(1,5),(2,4))

Regards,
Satish Chandra


On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j <jsatishchandra@gmail.com
> wrote:

> HI All,
> Currently using DSE 4.7 and Spark 1.2.2 version
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 7:30 PM, java8964 <ja...@hotmail.com> wrote:
>
>> What version of Spark you are using, or comes with DSE 4.7?
>>
>> We just cannot reproduce it in Spark.
>>
>> yzhang@localhost>$ more test.spark
>> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs.reduceByKey((x,y) => x + y).collect
>> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /___/ .__/\_,_/_/ /_/\_\   version 1.3.1
>>       /_/
>>
>> Using Scala version 2.10.4
>> Spark context available as sc.
>> SQL context available as sqlContext.
>> Loading test.spark...
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
>> makeRDD at <console>:21
>> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Yong
>>
>>
>> ------------------------------
>> Date: Fri, 21 Aug 2015 19:24:09 +0530
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: jsatishchandra@gmail.com
>> To: abhishsi@tetrationanalytics.com
>> CC: user@spark.apache.org
>>
>>
>> HI Abhishek,
>>
>> I have even tried that but rdd2 is empty
>>
>> Regards,
>> Satish
>>
>> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
>> abhishsi@tetrationanalytics.com> wrote:
>>
>> You had:
>>
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>>
>> Maybe try:
>>
>> > rdd2 = RDD.reduceByKey((x,y) => x+y)
>> > rdd2.take(3)
>>
>> -Abhishek-
>>
>> On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> > HI All,
>> > I have data in RDD as mentioned below:
>> >
>> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>> >
>> >
>> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>> >
>> > Code:
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>> >
>> > Result in console:
>> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> > res:Array[(Int,Int)] = Array()
>> >
>> > Command as mentioned
>> >
>> > dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>> >
>> >
>> > Please let me know what is missing in my code, as my resultant Array is
>> empty
>> >
>> >
>> >
>> > Regards,
>> > Satish
>> >
>>
>>
>>
>

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI All,
Currently using DSE 4.7 and Spark 1.2.2 version

Regards,
Satish

On Fri, Aug 21, 2015 at 7:30 PM, java8964 <ja...@hotmail.com> wrote:

> What version of Spark you are using, or comes with DSE 4.7?
>
> We just cannot reproduce it in Spark.
>
> yzhang@localhost>$ more test.spark
> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs.reduceByKey((x,y) => x + y).collect
> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 1.3.1
>       /_/
>
> Using Scala version 2.10.4
> Spark context available as sc.
> SQL context available as sqlContext.
> Loading test.spark...
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
> makeRDD at <console>:21
> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
> UseCompressedOops is set; assuming yes
> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Yong
>
>
> ------------------------------
> Date: Fri, 21 Aug 2015 19:24:09 +0530
> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
> From: jsatishchandra@gmail.com
> To: abhishsi@tetrationanalytics.com
> CC: user@spark.apache.org
>
>
> HI Abhishek,
>
> I have even tried that but rdd2 is empty
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
> abhishsi@tetrationanalytics.com> wrote:
>
> You had:
>
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
>
> Maybe try:
>
> > rdd2 = RDD.reduceByKey((x,y) => x+y)
> > rdd2.take(3)
>
> -Abhishek-
>
> On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com>
> wrote:
>
> > HI All,
> > I have data in RDD as mentioned below:
> >
> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> >
> >
> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
> on Values for each key
> >
> > Code:
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
> >
> > Result in console:
> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
> at <console>:73
> > res:Array[(Int,Int)] = Array()
> >
> > Command as mentioned
> >
> > dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
> >
> >
> > Please let me know what is missing in my code, as my resultant Array is
> empty
> >
> >
> >
> > Regards,
> > Satish
> >
>
>
>

RE: Transformation not happening for reduceByKey or GroupByKey

Posted by java8964 <ja...@hotmail.com>.

What version of Spark you are using, or comes with DSE 4.7?
We just cannot reproduce it in Spark.
yzhang@localhost>$ more test.sparkval pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs.reduceByKey((x,y) => x + y).collectyzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.sparkWelcome to      ____              __     / __/__  ___ _____/ /__    _\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 1.3.1      /_/
Using Scala version 2.10.4Spark context available as sc.SQL context available as sqlContext.Loading test.spark...pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:2115/08/21 09:58:51 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yesres0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Yong

Date: Fri, 21 Aug 2015 19:24:09 +0530
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: jsatishchandra@gmail.com
To: abhishsi@tetrationanalytics.com
CC: user@spark.apache.org

HI Abhishek,
I have even tried that but rdd2 is empty
Regards,Satish
On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <ab...@tetrationanalytics.com> wrote:
You had:



> RDD.reduceByKey((x,y) => x+y)

> RDD.take(3)



Maybe try:



> rdd2 = RDD.reduceByKey((x,y) => x+y)

> rdd2.take(3)



-Abhishek-



On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com> wrote:



> HI All,

> I have data in RDD as mentioned below:

>

> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))

>

>

> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key

>

> Code:

> RDD.reduceByKey((x,y) => x+y)

> RDD.take(3)

>

> Result in console:

> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73

> res:Array[(Int,Int)] = Array()

>

> Command as mentioned

>

> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>

>

>

> Please let me know what is missing in my code, as my resultant Array is empty

>

>

>

> Regards,

> Satish

>

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI Abhishek,

I have even tried that but rdd2 is empty

Regards,
Satish

On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
abhishsi@tetrationanalytics.com> wrote:

> You had:
>
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
>
> Maybe try:
>
> > rdd2 = RDD.reduceByKey((x,y) => x+y)
> > rdd2.take(3)
>
> -Abhishek-
>
> On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com>
> wrote:
>
> > HI All,
> > I have data in RDD as mentioned below:
> >
> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> >
> >
> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
> on Values for each key
> >
> > Code:
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
> >
> > Result in console:
> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
> at <console>:73
> > res:Array[(Int,Int)] = Array()
> >
> > Command as mentioned
> >
> > dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
> >
> >
> > Please let me know what is missing in my code, as my resultant Array is
> empty
> >
> >
> >
> > Regards,
> > Satish
> >
>
>

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by "Abhishek R. Singh" <ab...@tetrationanalytics.com>.

You had:

> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)

Maybe try:

> rdd2 = RDD.reduceByKey((x,y) => x+y)
> rdd2.take(3)

-Abhishek-

On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com> wrote:

> HI All,
> I have data in RDD as mentioned below:
> 
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> 
> 
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key
> 
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
> 
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73
> res:Array[(Int,Int)] = Array()
> 
> Command as mentioned
> 
> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
> 
> 
> Please let me know what is missing in my code, as my resultant Array is empty
> 
> 
> 
> Regards,
> Satish
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

Yes, DSE 4.7

Regards,
Satish Chandra

On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk> wrote:

> Not sure, never used dse - it’s part of DataStax Enterprise right?
>
> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
> wrote:
>
> HI Robin,
> Yes, below mentioned piece or code works fine in Spark Shell but the same
> when place in Script File and executed with -i <file name> it creating an
> empty RDD
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Command:
>
>         dse spark --master local --jars postgresql-9.4-1201.jar -i
>  <ScriptFile>
>
> I understand, I am missing something here due to which my final RDD does
> not have as required output
>
> Regards,
> Satish Chandra
>
> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
> wrote:
>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>
>

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI All,
Any inputs for the actual problem statement

Regards,
Satish


On Fri, Aug 21, 2015 at 5:57 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Yong, Thanks for your reply.
>
> I tried spark-shell -i <script-file>, it works fine for me. Not sure the
> different with
> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>
> On Fri, Aug 21, 2015 at 7:01 PM, java8964 <ja...@hotmail.com> wrote:
>
>> I believe "spark-shell -i scriptFile" is there. We also use it, at least
>> in Spark 1.3.1.
>>
>> "dse spark" will just wrap "spark-shell" command, underline it is just
>> invoking "spark-shell".
>>
>> I don't know too much about the original problem though.
>>
>> Yong
>>
>> ------------------------------
>> Date: Fri, 21 Aug 2015 18:19:49 +0800
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: zjffdu@gmail.com
>> To: jsatishchandra@gmail.com
>> CC: robin.east@xense.co.uk; user@spark.apache.org
>>
>>
>> Hi Satish,
>>
>> I don't see where spark support "-i", so suspect it is provided by DSE.
>> In that case, it might be bug of DSE.
>>
>>
>>
>> On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <
>> jsatishchandra@gmail.com> wrote:
>>
>> HI Robin,
>> Yes, it is DSE but issue is related to Spark only
>>
>> Regards,
>> Satish Chandra
>>
>> On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk>
>> wrote:
>>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i <file name> it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>>         dse spark --master local --jars postgresql-9.4-1201.jar -i
>>  <ScriptFile>
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
>> wrote:
>>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

RE: Transformation not happening for reduceByKey or GroupByKey

Posted by java8964 <ja...@hotmail.com>.

I believe "spark-shell -i scriptFile" is there. We also use it, at least in Spark 1.3.1.
"dse spark" will just wrap "spark-shell" command, underline it is just invoking "spark-shell".
I don't know too much about the original problem though.
Yong
Date: Fri, 21 Aug 2015 18:19:49 +0800
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: zjffdu@gmail.com
To: jsatishchandra@gmail.com
CC: robin.east@xense.co.uk; user@spark.apache.org

Hi Satish,
I don't see where spark support "-i", so suspect it is provided by DSE. In that case, it might be bug of DSE.

On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <js...@gmail.com> wrote:
HI Robin,Yes, it is DSE but issue is related to Spark only
Regards,Satish Chandra
On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk> wrote:
Not sure, never used dse - it’s part of DataStax Enterprise right?
On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com> wrote:
HI Robin,Yes, below mentioned piece or code works fine in Spark Shell but the same when place in Script File and executed with -i <file name> it creating an empty RDD
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28

scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Command:
        dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>

I understand, I am missing something here due to which my final RDD does not have as required output
Regards,Satish Chandra
On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk> wrote:
This works for me:
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28

scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com> wrote:
HI All,I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))

I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key
Code:RDD.reduceByKey((x,y) => x+y)RDD.take(3)
Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73res:Array[(Int,Int)] = Array()
Command as mentioned

dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>

Please let me know what is missing in my code, as my resultant Array is empty

Regards,Satish

-- 
Best Regards

Jeff Zhang

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Satish,

I don't see where spark support "-i", so suspect it is provided by DSE. In
that case, it might be bug of DSE.



On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <js...@gmail.com>
wrote:

> HI Robin,
> Yes, it is DSE but issue is related to Spark only
>
> Regards,
> Satish Chandra
>
> On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk>
> wrote:
>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i <file name> it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>>         dse spark --master local --jars postgresql-9.4-1201.jar -i
>>  <ScriptFile>
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
>> wrote:
>>
>>> This works for me:
>>>
>>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>>> at makeRDD at <console>:28
>>>
>>>
>>> scala> pairs.reduceByKey((x,y) => x + y).collect
>>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>>
>>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>>> wrote:
>>>
>>> HI All,
>>> I have data in RDD as mentioned below:
>>>
>>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>>
>>>
>>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>>> on Values for each key
>>>
>>> Code:
>>> RDD.reduceByKey((x,y) => x+y)
>>> RDD.take(3)
>>>
>>> Result in console:
>>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>>> at <console>:73
>>> res:Array[(Int,Int)] = Array()
>>>
>>> Command as mentioned
>>>
>>> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>>>
>>>
>>> Please let me know what is missing in my code, as my resultant Array is
>>> empty
>>>
>>>
>>>
>>> Regards,
>>> Satish
>>>
>>>
>>>
>>
>>
>


-- 
Best Regards

Jeff Zhang

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI Robin,
Yes, it is DSE but issue is related to Spark only

Regards,
Satish Chandra

On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk> wrote:

> Not sure, never used dse - it’s part of DataStax Enterprise right?
>
> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
> wrote:
>
> HI Robin,
> Yes, below mentioned piece or code works fine in Spark Shell but the same
> when place in Script File and executed with -i <file name> it creating an
> empty RDD
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Command:
>
>         dse spark --master local --jars postgresql-9.4-1201.jar -i
>  <ScriptFile>
>
> I understand, I am missing something here due to which my final RDD does
> not have as required output
>
> Regards,
> Satish Chandra
>
> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
> wrote:
>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>
>

Re: Transformation not happening for reduceByKey or GroupByKey

Posted by satish chandra j <js...@gmail.com>.

HI Robin,
Yes, below mentioned piece or code works fine in Spark Shell but the same
when place in Script File and executed with -i <file name> it creating an
empty RDD

scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
makeRDD at <console>:28


scala> pairs.reduceByKey((x,y) => x + y).collect
res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

Command:

        dse spark --master local --jars postgresql-9.4-1201.jar -i
 <ScriptFile>

I understand, I am missing something here due to which my final RDD does
not have as required output

Regards,
Satish Chandra

On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk> wrote:

> This works for me:
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
> wrote:
>
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>
>