You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shubham Patil (Jira)" <ji...@apache.org> on 2024/01/09 15:33:00 UTC
[jira] [Created] (SPARK-46636) Pyspark throwing TypeError while collecting a RDD
Shubham Patil created SPARK-46636:
-------------------------------------
Summary: Pyspark throwing TypeError while collecting a RDD
Key: SPARK-46636
URL: https://issues.apache.org/jira/browse/SPARK-46636
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.3.4
Environment: Running this in anaconda jupyter notebook
Python== 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23)
[MSC v.1916 64 bit (AMD64)]
Spark== 3.3.4
pyspark== 3.4.1
Reporter: Shubham Patil
Im trying to collect a RDD after applying a filter on it but its throwing an error.
Error can be reproduced from below code
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").appName("Practice").getOrCreate()
sc = spark.sparkContext
data = [1,2,3,4,5,6,7,8,9,10,11,12]
dataRdd = sc.parallelize(data)
dataRdd = dataRdd.filter(lambda a: a%2==0)
dataRdd.collect() {code}
Below is the error that its throwing:
{code:java}
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[18], line 1 ----> 1 dataRdd.collect()
File ~\anaconda3\envs\spark_latest\Lib\site-packages\pyspark\rdd.py:
1814, in RDD.collect(self)
1812 with SCCallSiteSync(self.context):
1813 assert self.ctx._jvm is not None
-> 1814 sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
1815 return list(_load_from_socket(sock_info, self._jrdd_deserializer))
File ~\anaconda3\envs\spark_latest\Lib\site-packages\pyspark\rdd.py:
5441, in PipelinedRDD._jrdd(self)
438 else:
5439 profiler = None
-> 5441 wrapped_func = _wrap_function(
5442 self.ctx, self.func, self._prev_jrdd_deserializer, self._jrdd_deserializer, profiler
5443 )
5445 assert self.ctx._jvm is not None
5446 python_rdd = self.ctx._jvm.PythonRDD(
5447 self._prev_jrdd.rdd(), wrapped_func, self.preservesPartitioning, self.is_barrier
5448 )
File ~\anaconda3\envs\spark_latest\Lib\site-packages\pyspark\rdd.py:
5243, in _wrap_function(sc, func, deserializer, serializer, profiler)
5241 pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
5242 assert sc._jvm is not None
-> 5243 return sc._jvm.SimplePythonFunction(
5244 bytearray(pickled_command),
5245 env,
5246 includes,
5247 sc.pythonExec,
5248 sc.pythonVer,
5249 broadcast_vars,
5250 sc._javaAccumulator,
5251 )
TypeError: 'JavaPackage' object is not callable{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org