You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/07/31 03:43:00 UTC
[jira] [Commented] (SPARK-24976) Allow None for Decimal type conversion (specific to Arrow 0.9.0)

    [ https://issues.apache.org/jira/browse/SPARK-24976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563117#comment-16563117 ] 

Apache Spark commented on SPARK-24976:
--------------------------------------

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/21928

> Allow None for Decimal type conversion (specific to Arrow 0.9.0)
> ----------------------------------------------------------------
>
>                 Key: SPARK-24976
>                 URL: https://issues.apache.org/jira/browse/SPARK-24976
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> See https://jira.apache.org/jira/browse/ARROW-2432
> If we use Arrow 0.9.0, the the test case (None as decimal) failed as below:
> {code}
> Traceback (most recent call last):
>   File "/.../spark/python/pyspark/sql/tests.py", line 4672, in test_vectorized_udf_null_decimal
>     self.assertEquals(df.collect(), res.collect())
>   File "/.../spark/python/pyspark/sql/dataframe.py", line 533, in collect
>     sock_info = self._jdf.collectToPython()
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
>     answer, self.gateway_client, self.target_id, self.name)
>   File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco
>     return f(*a, **kw)
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
>     format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o51.collectToPython.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 1.0 failed 1 times, most recent failure: Lost task 3.0 in stage 1.0 (TID 7, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
>   File "/.../spark/python/pyspark/worker.py", line 320, in main
>     process()
>   File "/.../spark/python/pyspark/worker.py", line 315, in process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/.../spark/python/pyspark/serializers.py", line 274, in dump_stream
>     batch = _create_batch(series, self._timezone)
>   File "/.../spark/python/pyspark/serializers.py", line 243, in _create_batch
>     arrs = [create_array(s, t) for s, t in series]
>   File "/.../spark/python/pyspark/serializers.py", line 241, in create_array
>     return pa.Array.from_pandas(s, mask=mask, type=t)
>   File "array.pxi", line 383, in pyarrow.lib.Array.from_pandas
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 77, in pyarrow.lib.check_status
> ArrowInvalid: Error converting from Python objects to Decimal: Got Python object of type NoneType but can only handle these types: decimal.Decimal
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org