You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Niketan Pansare (JIRA)" <ji...@apache.org> on 2017/03/17 19:23:41 UTC

[jira] [Resolved] (SYSTEMML-1370) Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.

     [ https://issues.apache.org/jira/browse/SYSTEMML-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niketan Pansare resolved SYSTEMML-1370.
---------------------------------------
       Resolution: Fixed
    Fix Version/s: SystemML 1.0

Fixed in the commit https://github.com/apache/incubator-systemml/commit/81090134d2de04a3ae90c6f8d79b4c68cb14aab5

> Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SYSTEMML-1370
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1370
>             Project: SystemML
>          Issue Type: Bug
>          Components: APIs
>    Affects Versions: Not Applicable
>         Environment: pyspark with local Spark 2.1
>            Reporter: Berthold Reinwald
>             Fix For: SystemML 1.0
>
>
> Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB?
> Below simple script works for 23100 rows, while 46900 fails. This is how to easily and consistently reproduce.
> START:
> $pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G --executor-memory 2G
> PYTHON SCRIPT:
> from systemml import MLContext, dml
> import pandas as pd
> sc.version
> ml = MLContext(sc)
> print "Spark Version:", sc.version
> print "SystemML Version:", ml.version()
> print "SystemML Built-Time:", ml.buildTime()
> # !! number of rows 23100 works, while 46900 fails
> nr = 46900
> X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784)
> script ="""
>     write(X, $Xfile, format="csv")
> """
> prog = dml(script).input(X=X_pd).input(**{"$Xfile":"/tmp/X_pd.csv"})
> ml.execute(prog)
> OUTPUT:
> Spark Version: 2.1.0
> SystemML Version: 0.14.0-incubating-SNAPSHOT
> SystemML Built-Time: 2017-03-03 07:33:40 UTC
> ---------------------------------------------------------------------------
> Py4JError                                 Traceback (most recent call last)
> .......
> Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. Trace:
> java.lang.NegativeArraySizeException
> 	at py4j.Base64.decode(Base64.java:321)
> 	at py4j.Protocol.getBytes(Protocol.java:173)
> 	at py4j.Protocol.getObject(Protocol.java:294)
> 	at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:77)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:214)
> 	at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)