You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Franc Carter <fr...@gmail.com> on 2016/01/09 22:55:15 UTC

pyspark: calculating row deltas

Hi,

I have a DataFrame with the columns

     ID,Year,Value

I'd like to create a new Column that is Value2-Value1 where the
corresponding Year2=Year-1

At the moment I am creating  a new DataFrame with renamed columns and doing

   DF.join(DF2, . . . .)

 This looks cumbersome to me, is there abtter way ?

thanks


-- 
Franc

Re: pyspark: calculating row deltas

Posted by Franc Carter <fr...@gmail.com>.
Thanks

cheers

On 10 January 2016 at 22:35, Blaž Šnuderl <sn...@gmail.com> wrote:

> This can be done using spark.sql and window functions. Take a look at
> https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
>
> On Sun, Jan 10, 2016 at 11:07 AM, Franc Carter <fr...@gmail.com>
> wrote:
>
>>
>> Sure, for a dataframe that looks like this
>>
>> ID Year Value
>>  1 2012   100
>>  1 2013   102
>>  1 2014   106
>>  2 2012   110
>>  2 2013   118
>>  2 2014   128
>>
>> I'd like to get back
>>
>> ID Year Value
>>  1 2013     2
>>  1 2014     4
>>  2 2013     8
>>  2 2014    10
>>
>> i.e the Value for an ID,Year combination is the Value for the ID,Year
>> minus the Value for the ID,Year-1
>>
>> thanks
>>
>>
>>
>>
>>
>>
>> On 10 January 2016 at 20:51, Femi Anthony <fe...@gmail.com> wrote:
>>
>>> Can you clarify what you mean with an actual example ?
>>>
>>> For example, if your data frame looks like this:
>>>
>>> ID  Year   Value
>>> 1    2012   100
>>> 2    2013   101
>>> 3    2014   102
>>>
>>> What's your desired output ?
>>>
>>> Femi
>>>
>>>
>>> On Sat, Jan 9, 2016 at 4:55 PM, Franc Carter <fr...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I have a DataFrame with the columns
>>>>
>>>>      ID,Year,Value
>>>>
>>>> I'd like to create a new Column that is Value2-Value1 where the
>>>> corresponding Year2=Year-1
>>>>
>>>> At the moment I am creating  a new DataFrame with renamed columns and
>>>> doing
>>>>
>>>>    DF.join(DF2, . . . .)
>>>>
>>>>  This looks cumbersome to me, is there abtter way ?
>>>>
>>>> thanks
>>>>
>>>>
>>>> --
>>>> Franc
>>>>
>>>
>>>
>>>
>>> --
>>> http://www.femibyte.com/twiki5/bin/view/Tech/
>>> http://www.nextmatrix.com
>>> "Great spirits have always encountered violent opposition from mediocre
>>> minds." - Albert Einstein.
>>>
>>
>>
>>
>> --
>> Franc
>>
>
>


-- 
Franc

Re: pyspark: calculating row deltas

Posted by Blaž Šnuderl <sn...@gmail.com>.
This can be done using spark.sql and window functions. Take a look at
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

On Sun, Jan 10, 2016 at 11:07 AM, Franc Carter <fr...@gmail.com>
wrote:

>
> Sure, for a dataframe that looks like this
>
> ID Year Value
>  1 2012   100
>  1 2013   102
>  1 2014   106
>  2 2012   110
>  2 2013   118
>  2 2014   128
>
> I'd like to get back
>
> ID Year Value
>  1 2013     2
>  1 2014     4
>  2 2013     8
>  2 2014    10
>
> i.e the Value for an ID,Year combination is the Value for the ID,Year
> minus the Value for the ID,Year-1
>
> thanks
>
>
>
>
>
>
> On 10 January 2016 at 20:51, Femi Anthony <fe...@gmail.com> wrote:
>
>> Can you clarify what you mean with an actual example ?
>>
>> For example, if your data frame looks like this:
>>
>> ID  Year   Value
>> 1    2012   100
>> 2    2013   101
>> 3    2014   102
>>
>> What's your desired output ?
>>
>> Femi
>>
>>
>> On Sat, Jan 9, 2016 at 4:55 PM, Franc Carter <fr...@gmail.com>
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> I have a DataFrame with the columns
>>>
>>>      ID,Year,Value
>>>
>>> I'd like to create a new Column that is Value2-Value1 where the
>>> corresponding Year2=Year-1
>>>
>>> At the moment I am creating  a new DataFrame with renamed columns and
>>> doing
>>>
>>>    DF.join(DF2, . . . .)
>>>
>>>  This looks cumbersome to me, is there abtter way ?
>>>
>>> thanks
>>>
>>>
>>> --
>>> Franc
>>>
>>
>>
>>
>> --
>> http://www.femibyte.com/twiki5/bin/view/Tech/
>> http://www.nextmatrix.com
>> "Great spirits have always encountered violent opposition from mediocre
>> minds." - Albert Einstein.
>>
>
>
>
> --
> Franc
>

Re: pyspark: calculating row deltas

Posted by Franc Carter <fr...@gmail.com>.
Sure, for a dataframe that looks like this

ID Year Value
 1 2012   100
 1 2013   102
 1 2014   106
 2 2012   110
 2 2013   118
 2 2014   128

I'd like to get back

ID Year Value
 1 2013     2
 1 2014     4
 2 2013     8
 2 2014    10

i.e the Value for an ID,Year combination is the Value for the ID,Year minus
the Value for the ID,Year-1

thanks






On 10 January 2016 at 20:51, Femi Anthony <fe...@gmail.com> wrote:

> Can you clarify what you mean with an actual example ?
>
> For example, if your data frame looks like this:
>
> ID  Year   Value
> 1    2012   100
> 2    2013   101
> 3    2014   102
>
> What's your desired output ?
>
> Femi
>
>
> On Sat, Jan 9, 2016 at 4:55 PM, Franc Carter <fr...@gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> I have a DataFrame with the columns
>>
>>      ID,Year,Value
>>
>> I'd like to create a new Column that is Value2-Value1 where the
>> corresponding Year2=Year-1
>>
>> At the moment I am creating  a new DataFrame with renamed columns and
>> doing
>>
>>    DF.join(DF2, . . . .)
>>
>>  This looks cumbersome to me, is there abtter way ?
>>
>> thanks
>>
>>
>> --
>> Franc
>>
>
>
>
> --
> http://www.femibyte.com/twiki5/bin/view/Tech/
> http://www.nextmatrix.com
> "Great spirits have always encountered violent opposition from mediocre
> minds." - Albert Einstein.
>



-- 
Franc

Re: pyspark: calculating row deltas

Posted by Femi Anthony <fe...@gmail.com>.
Can you clarify what you mean with an actual example ?

For example, if your data frame looks like this:

ID  Year   Value
1    2012   100
2    2013   101
3    2014   102

What's your desired output ?

Femi


On Sat, Jan 9, 2016 at 4:55 PM, Franc Carter <fr...@gmail.com> wrote:

>
> Hi,
>
> I have a DataFrame with the columns
>
>      ID,Year,Value
>
> I'd like to create a new Column that is Value2-Value1 where the
> corresponding Year2=Year-1
>
> At the moment I am creating  a new DataFrame with renamed columns and doing
>
>    DF.join(DF2, . . . .)
>
>  This looks cumbersome to me, is there abtter way ?
>
> thanks
>
>
> --
> Franc
>



-- 
http://www.femibyte.com/twiki5/bin/view/Tech/
http://www.nextmatrix.com
"Great spirits have always encountered violent opposition from mediocre
minds." - Albert Einstein.