You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ca...@free.fr on 2022/01/19 02:41:11 UTC

newbie question for reduce

Hello

Please help take a look why my this simple reduce doesn't work?

>>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
>>> 
>>> rdd.reduce(lambda x,y: x[1]+y[1])
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
     return reduce(f, vals)
   File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
     return f(*args, **kwargs)
   File "<stdin>", line 1, in <lambda>
TypeError: 'int' object is not subscriptable
>>> 


spark 3.2.0

Thank you.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: newbie question for reduce

Posted by Sean Owen <sr...@gmail.com>.

The problem is that you are reducing a list of tuples, but you are
producing an int. The resulting int can't be combined with other tuples
with your function. reduce() has to produce the same type as its arguments.
rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y)
... would work

On Tue, Jan 18, 2022 at 8:41 PM <ca...@free.fr> wrote:

> Hello
>
> Please help take a look why my this simple reduce doesn't work?
>
> >>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
> >>>
> >>> rdd.reduce(lambda x,y: x[1]+y[1])
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
>      return reduce(f, vals)
>    File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
>      return f(*args, **kwargs)
>    File "<stdin>", line 1, in <lambda>
> TypeError: 'int' object is not subscriptable
> >>>
>
>
> spark 3.2.0
>
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

RE: newbie question for reduce

Posted by Christopher Robson <CR...@scottlogic.com.INVALID>.

Hi,

The reduce lambda accepts as its first argument the return value of the previous execution. The first time, it is invoked with:
x = ("a", 1), y = ("b", 2)
And returns 1+2=3
Second time, it is invoked with
x = 3, y = ("c", 3)
so you can see why it raises the error that you are seeing.

There are several ways you could fix it. One way is to use a map before the reduce, e.g. 
rdd..map(lambda x: x[1]).reduce(lambda x,y: x + y)

Hope that's helpful,

Chris

-----Original Message-----
From: capitnfrakass@free.fr <ca...@free.fr> 
Sent: 19 January 2022 02:41
To: user@spark.apache.org
Subject: newbie question for reduce

Hello

Please help take a look why my this simple reduce doesn't work?

>>> rdd = sc.parallelize([("a",1),("b",2),("c",3)])
>>> 
>>> rdd.reduce(lambda x,y: x[1]+y[1])
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
     return reduce(f, vals)
   File "/opt/spark/python/pyspark/util.py", line 74, in wrapper
     return f(*args, **kwargs)
   File "<stdin>", line 1, in <lambda>
TypeError: 'int' object is not subscriptable
>>> 


spark 3.2.0

Thank you.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org