You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "boyingking@163.com" <bo...@163.com> on 2014/09/16 05:38:57 UTC

About SpakSQL OR MLlib

Hi:
I have a dataset ,the struct [id,driverAge，TotalKiloMeter ，Emissions ，date，KiloMeter ，fuel]， and the data like this:
[1-980,34,221926,9,2005-2-8,123,14]
[1-981,49,271321,15,2005-2-8,181,82]
[1-982,36,189149,18,2005-2-8,162,51]
[1-983,51,232753,5,2005-2-8,106,92]
[1-984,56,45338,8,2005-2-8,156,98]
[1-985,45,132060,4,2005-2-8,179,98]
[1-986,40,15751,5,2005-2-8,149,77]
[1-987,36,167930,17,2005-2-8,121,87]
[1-988,53,44949,4,2005-2-8,195,72]
[1-989,34,252867,5,2005-2-8,181,86]
[1-990,53,152858,11,2005-2-8,130,43]
[1-991,40,126831,11,2005-2-8,126,47]
………………………………………………

now ,my requirments is group by driverAge, five is a step,like 20~25 is a group,26~30 is a group?
how should i do ? who can give some code?





boyingking@163.com

Re: Re: About SpakSQL OR MLlib

Posted by "boyingking@163.com" <bo...@163.com>.

case class Car(id:String,age:Int,tkm:Int,emissions:Int,date:Date, km:Int,
fuel:Int)

1. Create an PairedRDD of (age,Car) tuples (pairedRDD)
2. Create a new function fc

//returns the interval lower and upper bound

def fc(x:Int, interval:Int) : (Int,Int) = {

     val floor = x - (x%interval)

     val ceil = floor + interval

     (floor,ceil)

 }
3. do a groupBy on this RDD (step 1) by passing the function fc

val myrdd = pairedRDD.groupBy( x => fun(x.age, 5) )


On Mon, Sep 15, 2014 at 11:38 PM, boyingking@163.com <bo...@163.com>
wrote:

>  Hi:
> I have a dataset ,the struct [id,driverAge，TotalKiloMeter ，Emissions
> ，date，KiloMeter ，fuel]， and the data like this:
>  [1-980,34,221926,9,2005-2-8,123,14]
> [1-981,49,271321,15,2005-2-8,181,82]
> [1-982,36,189149,18,2005-2-8,162,51]
> [1-983,51,232753,5,2005-2-8,106,92]
> [1-984,56,45338,8,2005-2-8,156,98]
> [1-985,45,132060,4,2005-2-8,179,98]
> [1-986,40,15751,5,2005-2-8,149,77]
> [1-987,36,167930,17,2005-2-8,121,87]
> [1-988,53,44949,4,2005-2-8,195,72]
> [1-989,34,252867,5,2005-2-8,181,86]
> [1-990,53,152858,11,2005-2-8,130,43]
> [1-991,40,126831,11,2005-2-8,126,47]
> ………………………………………………
>
> now ,my requirments is group by driverAge, five is a step,like 20~25 is a
> group,26~30 is a group?
> how should i do ? who can give some code?
>
>
> ------------------------------
>  boyingking@163.com
>

Re: About SpakSQL OR MLlib

Posted by Soumya Simanta <so...@gmail.com>.

case class Car(id:String,age:Int,tkm:Int,emissions:Int,date:Date, km:Int,
fuel:Int)

1. Create an PairedRDD of (age,Car) tuples (pairedRDD)
2. Create a new function fc

//returns the interval lower and upper bound

def fc(x:Int, interval:Int) : (Int,Int) = {

     val floor = x - (x%interval)

     val ceil = floor + interval

     (floor,ceil)

 }
3. do a groupBy on this RDD (step 1) by passing the function fc

val myrdd = pairedRDD.groupBy( x => fun(x.age, 5) )


On Mon, Sep 15, 2014 at 11:38 PM, boyingking@163.com <bo...@163.com>
wrote:

>  Hi:
> I have a dataset ,the struct [id,driverAge，TotalKiloMeter ，Emissions
> ，date，KiloMeter ，fuel]， and the data like this:
>  [1-980,34,221926,9,2005-2-8,123,14]
> [1-981,49,271321,15,2005-2-8,181,82]
> [1-982,36,189149,18,2005-2-8,162,51]
> [1-983,51,232753,5,2005-2-8,106,92]
> [1-984,56,45338,8,2005-2-8,156,98]
> [1-985,45,132060,4,2005-2-8,179,98]
> [1-986,40,15751,5,2005-2-8,149,77]
> [1-987,36,167930,17,2005-2-8,121,87]
> [1-988,53,44949,4,2005-2-8,195,72]
> [1-989,34,252867,5,2005-2-8,181,86]
> [1-990,53,152858,11,2005-2-8,130,43]
> [1-991,40,126831,11,2005-2-8,126,47]
> ………………………………………………
>
> now ,my requirments is group by driverAge, five is a step,like 20~25 is a
> group,26~30 is a group?
> how should i do ? who can give some code?
>
>
> ------------------------------
>  boyingking@163.com
>