You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yikun Jiang (Jira)" <ji...@apache.org> on 2021/07/29 03:59:00 UTC

[jira] [Commented] (SPARK-36000) Support creation and operations of ps.Series/Index with Decimal('NaN')

    [ https://issues.apache.org/jira/browse/SPARK-36000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389243#comment-17389243 ] 

Yikun Jiang commented on SPARK-36000:
-------------------------------------

[~XinrongM] I did some investigation on this, I found the problem is triggered in python to Java unpickling, because Decimal('NaN') is not supported by net.razorvine. pickle

In Python

{code:java}
>>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
>>> pickle.loads(pickled)
Decimal('NaN')
{code}

In Scala

{code:java}
scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
scala> val unpickle = new Unpickler
scala> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
net.razorvine.pickle.PickleException: problem construction object: java.lang.reflect.InvocationTargetException
 at net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
 ... 48 elided
{code}

I submit a PR in pickle upstream https://github.com/irmen/pickle/issues/7 . Looks like we can only contine this jira after this fix and bump pickle version to fixed version.

> Support creation and operations of ps.Series/Index with Decimal('NaN')
> ----------------------------------------------------------------------
>
>                 Key: SPARK-36000
>                 URL: https://issues.apache.org/jira/browse/SPARK-36000
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Xinrong Meng
>            Priority: Major
>
> The creation and operations of ps.Series/Index with Decimal('NaN') doesn't work as expected.
> That might be due to the underlying PySpark limit.
> Please refer to sub-tasks for issues detected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org