You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Berthold Reinwald (JIRA)" <ji...@apache.org> on 2017/03/03 08:55:45 UTC

[jira] [Created] (SYSTEMML-1370) Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.

Berthold Reinwald created SYSTEMML-1370:
-------------------------------------------

             Summary: Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.
                 Key: SYSTEMML-1370
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1370
             Project: SystemML
          Issue Type: Bug
          Components: APIs
    Affects Versions: Not Applicable
         Environment: pyspark with local Spark 2.1


            Reporter: Berthold Reinwald


Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB?

Below simple script works for 23100 rows, while 46900 fails. This is how to easily and consistently reproduce.

START:
$pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G --executor-memory 2G

PYTHON SCRIPT:
from systemml import MLContext, dml
import pandas as pd

sc.version
ml = MLContext(sc)
print "Spark Version:", sc.version
print "SystemML Version:", ml.version()
print "SystemML Built-Time:", ml.buildTime()

# !! number of rows 23100 works, while 46900 fails
nr = 46900

X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784)

script ="""
    write(X, $Xfile, format="csv")
"""
prog = dml(script).input(X=X_pd).input(**{"$Xfile":"/tmp/X_pd.csv"})
ml.execute(prog)

OUTPUT:
Spark Version: 2.1.0
SystemML Version: 0.14.0-incubating-SNAPSHOT
SystemML Built-Time: 2017-03-03 07:33:40 UTC
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
.......

Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. Trace:
java.lang.NegativeArraySizeException
	at py4j.Base64.decode(Base64.java:321)
	at py4j.Protocol.getBytes(Protocol.java:173)
	at py4j.Protocol.getObject(Protocol.java:294)
	at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82)
	at py4j.commands.CallCommand.execute(CallCommand.java:77)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:745)






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)