You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by satish chandra j <js...@gmail.com> on 2015/08/20 12:05:52 UTC
Transformation not happening for reduceByKey or GroupByKey
HI All,
I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
Values for each key
Code:
RDD.reduceByKey((x,y) => x+y)
RDD.take(3)
Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
<console>:73
res:Array[(Int,Int)] = Array()
Command as mentioned
dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
Please let me know what is missing in my code, as my resultant Array is
empty
Regards,
Satish
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI All,
Could anybody let me know what is that i missing here, it should work as
its a basic transformation
Please let me know if any additional information required
Regards,
Satish
On Thu, Aug 20, 2015 at 3:35 PM, satish chandra j <js...@gmail.com>
wrote:
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI All,
Please find fix info for users who are following the mail chain of this
issue and the respective solution below:
*reduceByKey: Non working snippet*
import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new SparkContext(conf)
val DataRDD = SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect
Result: Array() is empty
*reduceByKey: Working snippet*
import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new
SparkContext(conf).set("spark.driver.allowMultipleContexts","true")
val DataRDD = SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect
Result: Array((0,3),(1,5),(2,4))
Regards,
Satish Chandra
On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j <jsatishchandra@gmail.com
> wrote:
> HI All,
> Currently using DSE 4.7 and Spark 1.2.2 version
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 7:30 PM, java8964 <ja...@hotmail.com> wrote:
>
>> What version of Spark you are using, or comes with DSE 4.7?
>>
>> We just cannot reproduce it in Spark.
>>
>> yzhang@localhost>$ more test.spark
>> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs.reduceByKey((x,y) => x + y).collect
>> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
>> Welcome to
>> ____ __
>> / __/__ ___ _____/ /__
>> _\ \/ _ \/ _ `/ __/ '_/
>> /___/ .__/\_,_/_/ /_/\_\ version 1.3.1
>> /_/
>>
>> Using Scala version 2.10.4
>> Spark context available as sc.
>> SQL context available as sqlContext.
>> Loading test.spark...
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
>> makeRDD at <console>:21
>> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Yong
>>
>>
>> ------------------------------
>> Date: Fri, 21 Aug 2015 19:24:09 +0530
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: jsatishchandra@gmail.com
>> To: abhishsi@tetrationanalytics.com
>> CC: user@spark.apache.org
>>
>>
>> HI Abhishek,
>>
>> I have even tried that but rdd2 is empty
>>
>> Regards,
>> Satish
>>
>> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
>> abhishsi@tetrationanalytics.com> wrote:
>>
>> You had:
>>
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>>
>> Maybe try:
>>
>> > rdd2 = RDD.reduceByKey((x,y) => x+y)
>> > rdd2.take(3)
>>
>> -Abhishek-
>>
>> On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> > HI All,
>> > I have data in RDD as mentioned below:
>> >
>> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>> >
>> >
>> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>> >
>> > Code:
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>> >
>> > Result in console:
>> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> > res:Array[(Int,Int)] = Array()
>> >
>> > Command as mentioned
>> >
>> > dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>> >
>> >
>> > Please let me know what is missing in my code, as my resultant Array is
>> empty
>> >
>> >
>> >
>> > Regards,
>> > Satish
>> >
>>
>>
>>
>
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI All,
Currently using DSE 4.7 and Spark 1.2.2 version
Regards,
Satish
On Fri, Aug 21, 2015 at 7:30 PM, java8964 <ja...@hotmail.com> wrote:
> What version of Spark you are using, or comes with DSE 4.7?
>
> We just cannot reproduce it in Spark.
>
> yzhang@localhost>$ more test.spark
> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs.reduceByKey((x,y) => x + y).collect
> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /___/ .__/\_,_/_/ /_/\_\ version 1.3.1
> /_/
>
> Using Scala version 2.10.4
> Spark context available as sc.
> SQL context available as sqlContext.
> Loading test.spark...
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
> makeRDD at <console>:21
> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
> UseCompressedOops is set; assuming yes
> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Yong
>
>
> ------------------------------
> Date: Fri, 21 Aug 2015 19:24:09 +0530
> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
> From: jsatishchandra@gmail.com
> To: abhishsi@tetrationanalytics.com
> CC: user@spark.apache.org
>
>
> HI Abhishek,
>
> I have even tried that but rdd2 is empty
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
> abhishsi@tetrationanalytics.com> wrote:
>
> You had:
>
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
>
> Maybe try:
>
> > rdd2 = RDD.reduceByKey((x,y) => x+y)
> > rdd2.take(3)
>
> -Abhishek-
>
> On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com>
> wrote:
>
> > HI All,
> > I have data in RDD as mentioned below:
> >
> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> >
> >
> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
> on Values for each key
> >
> > Code:
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
> >
> > Result in console:
> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
> at <console>:73
> > res:Array[(Int,Int)] = Array()
> >
> > Command as mentioned
> >
> > dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
> >
> >
> > Please let me know what is missing in my code, as my resultant Array is
> empty
> >
> >
> >
> > Regards,
> > Satish
> >
>
>
>
RE: Transformation not happening for reduceByKey or GroupByKey
Posted by java8964 <ja...@hotmail.com>.
What version of Spark you are using, or comes with DSE 4.7?
We just cannot reproduce it in Spark.
yzhang@localhost>$ more test.sparkval pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs.reduceByKey((x,y) => x + y).collectyzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.sparkWelcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.3.1 /_/
Using Scala version 2.10.4Spark context available as sc.SQL context available as sqlContext.Loading test.spark...pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:2115/08/21 09:58:51 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yesres0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Yong
Date: Fri, 21 Aug 2015 19:24:09 +0530
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: jsatishchandra@gmail.com
To: abhishsi@tetrationanalytics.com
CC: user@spark.apache.org
HI Abhishek,
I have even tried that but rdd2 is empty
Regards,Satish
On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <ab...@tetrationanalytics.com> wrote:
You had:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
Maybe try:
> rdd2 = RDD.reduceByKey((x,y) => x+y)
> rdd2.take(3)
-Abhishek-
On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com> wrote:
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is empty
>
>
>
> Regards,
> Satish
>
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI Abhishek,
I have even tried that but rdd2 is empty
Regards,
Satish
On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
abhishsi@tetrationanalytics.com> wrote:
> You had:
>
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
>
> Maybe try:
>
> > rdd2 = RDD.reduceByKey((x,y) => x+y)
> > rdd2.take(3)
>
> -Abhishek-
>
> On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com>
> wrote:
>
> > HI All,
> > I have data in RDD as mentioned below:
> >
> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> >
> >
> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
> on Values for each key
> >
> > Code:
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
> >
> > Result in console:
> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
> at <console>:73
> > res:Array[(Int,Int)] = Array()
> >
> > Command as mentioned
> >
> > dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
> >
> >
> > Please let me know what is missing in my code, as my resultant Array is
> empty
> >
> >
> >
> > Regards,
> > Satish
> >
>
>
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by "Abhishek R. Singh" <ab...@tetrationanalytics.com>.
You had:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
Maybe try:
> rdd2 = RDD.reduceByKey((x,y) => x+y)
> rdd2.take(3)
-Abhishek-
On Aug 20, 2015, at 3:05 AM, satish chandra j <js...@gmail.com> wrote:
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is empty
>
>
>
> Regards,
> Satish
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
Yes, DSE 4.7
Regards,
Satish Chandra
On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk> wrote:
> Not sure, never used dse - it’s part of DataStax Enterprise right?
>
> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
> wrote:
>
> HI Robin,
> Yes, below mentioned piece or code works fine in Spark Shell but the same
> when place in Script File and executed with -i <file name> it creating an
> empty RDD
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Command:
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i
> <ScriptFile>
>
> I understand, I am missing something here due to which my final RDD does
> not have as required output
>
> Regards,
> Satish Chandra
>
> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
> wrote:
>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>
>
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI All,
Any inputs for the actual problem statement
Regards,
Satish
On Fri, Aug 21, 2015 at 5:57 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Yong, Thanks for your reply.
>
> I tried spark-shell -i <script-file>, it works fine for me. Not sure the
> different with
> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>
> On Fri, Aug 21, 2015 at 7:01 PM, java8964 <ja...@hotmail.com> wrote:
>
>> I believe "spark-shell -i scriptFile" is there. We also use it, at least
>> in Spark 1.3.1.
>>
>> "dse spark" will just wrap "spark-shell" command, underline it is just
>> invoking "spark-shell".
>>
>> I don't know too much about the original problem though.
>>
>> Yong
>>
>> ------------------------------
>> Date: Fri, 21 Aug 2015 18:19:49 +0800
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: zjffdu@gmail.com
>> To: jsatishchandra@gmail.com
>> CC: robin.east@xense.co.uk; user@spark.apache.org
>>
>>
>> Hi Satish,
>>
>> I don't see where spark support "-i", so suspect it is provided by DSE.
>> In that case, it might be bug of DSE.
>>
>>
>>
>> On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <
>> jsatishchandra@gmail.com> wrote:
>>
>> HI Robin,
>> Yes, it is DSE but issue is related to Spark only
>>
>> Regards,
>> Satish Chandra
>>
>> On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk>
>> wrote:
>>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i <file name> it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i
>> <ScriptFile>
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
>> wrote:
>>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
RE: Transformation not happening for reduceByKey or GroupByKey
Posted by java8964 <ja...@hotmail.com>.
I believe "spark-shell -i scriptFile" is there. We also use it, at least in Spark 1.3.1.
"dse spark" will just wrap "spark-shell" command, underline it is just invoking "spark-shell".
I don't know too much about the original problem though.
Yong
Date: Fri, 21 Aug 2015 18:19:49 +0800
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: zjffdu@gmail.com
To: jsatishchandra@gmail.com
CC: robin.east@xense.co.uk; user@spark.apache.org
Hi Satish,
I don't see where spark support "-i", so suspect it is provided by DSE. In that case, it might be bug of DSE.
On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <js...@gmail.com> wrote:
HI Robin,Yes, it is DSE but issue is related to Spark only
Regards,Satish Chandra
On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk> wrote:
Not sure, never used dse - it’s part of DataStax Enterprise right?
On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com> wrote:
HI Robin,Yes, below mentioned piece or code works fine in Spark Shell but the same when place in Script File and executed with -i <file name> it creating an empty RDD
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28
scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Command:
dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
I understand, I am missing something here due to which my final RDD does not have as required output
Regards,Satish Chandra
On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk> wrote:
This works for me:
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28
scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com> wrote:
HI All,I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key
Code:RDD.reduceByKey((x,y) => x+y)RDD.take(3)
Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73res:Array[(Int,Int)] = Array()
Command as mentioned
dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
Please let me know what is missing in my code, as my resultant Array is empty
Regards,Satish
--
Best Regards
Jeff Zhang
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by Jeff Zhang <zj...@gmail.com>.
Hi Satish,
I don't see where spark support "-i", so suspect it is provided by DSE. In
that case, it might be bug of DSE.
On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <js...@gmail.com>
wrote:
> HI Robin,
> Yes, it is DSE but issue is related to Spark only
>
> Regards,
> Satish Chandra
>
> On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk>
> wrote:
>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i <file name> it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i
>> <ScriptFile>
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
>> wrote:
>>
>>> This works for me:
>>>
>>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>>> at makeRDD at <console>:28
>>>
>>>
>>> scala> pairs.reduceByKey((x,y) => x + y).collect
>>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>>
>>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>>> wrote:
>>>
>>> HI All,
>>> I have data in RDD as mentioned below:
>>>
>>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>>
>>>
>>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>>> on Values for each key
>>>
>>> Code:
>>> RDD.reduceByKey((x,y) => x+y)
>>> RDD.take(3)
>>>
>>> Result in console:
>>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>>> at <console>:73
>>> res:Array[(Int,Int)] = Array()
>>>
>>> Command as mentioned
>>>
>>> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>>>
>>>
>>> Please let me know what is missing in my code, as my resultant Array is
>>> empty
>>>
>>>
>>>
>>> Regards,
>>> Satish
>>>
>>>
>>>
>>
>>
>
--
Best Regards
Jeff Zhang
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI Robin,
Yes, it is DSE but issue is related to Spark only
Regards,
Satish Chandra
On Fri, Aug 21, 2015 at 3:06 PM, Robin East <ro...@xense.co.uk> wrote:
> Not sure, never used dse - it’s part of DataStax Enterprise right?
>
> On 21 Aug 2015, at 10:07, satish chandra j <js...@gmail.com>
> wrote:
>
> HI Robin,
> Yes, below mentioned piece or code works fine in Spark Shell but the same
> when place in Script File and executed with -i <file name> it creating an
> empty RDD
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Command:
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i
> <ScriptFile>
>
> I understand, I am missing something here due to which my final RDD does
> not have as required output
>
> Regards,
> Satish Chandra
>
> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk>
> wrote:
>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>
>
Re: Transformation not happening for reduceByKey or GroupByKey
Posted by satish chandra j <js...@gmail.com>.
HI Robin,
Yes, below mentioned piece or code works fine in Spark Shell but the same
when place in Script File and executed with -i <file name> it creating an
empty RDD
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
makeRDD at <console>:28
scala> pairs.reduceByKey((x,y) => x + y).collect
res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Command:
dse spark --master local --jars postgresql-9.4-1201.jar -i
<ScriptFile>
I understand, I am missing something here due to which my final RDD does
not have as required output
Regards,
Satish Chandra
On Thu, Aug 20, 2015 at 8:23 PM, Robin East <ro...@xense.co.uk> wrote:
> This works for me:
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at <console>:28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> On 20 Aug 2015, at 11:05, satish chandra j <js...@gmail.com>
> wrote:
>
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> <console>:73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile>
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>
>