You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Berthold Reinwald (JIRA)" <ji...@apache.org> on 2017/03/03 08:55:45 UTC
[jira] [Created] (SYSTEMML-1370) Py4JError: An error occurred while
calling
z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.
Berthold Reinwald created SYSTEMML-1370:
-------------------------------------------
Summary: Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.
Key: SYSTEMML-1370
URL: https://issues.apache.org/jira/browse/SYSTEMML-1370
Project: SystemML
Issue Type: Bug
Components: APIs
Affects Versions: Not Applicable
Environment: pyspark with local Spark 2.1
Reporter: Berthold Reinwald
Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB?
Below simple script works for 23100 rows, while 46900 fails. This is how to easily and consistently reproduce.
START:
$pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G --executor-memory 2G
PYTHON SCRIPT:
from systemml import MLContext, dml
import pandas as pd
sc.version
ml = MLContext(sc)
print "Spark Version:", sc.version
print "SystemML Version:", ml.version()
print "SystemML Built-Time:", ml.buildTime()
# !! number of rows 23100 works, while 46900 fails
nr = 46900
X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784)
script ="""
write(X, $Xfile, format="csv")
"""
prog = dml(script).input(X=X_pd).input(**{"$Xfile":"/tmp/X_pd.csv"})
ml.execute(prog)
OUTPUT:
Spark Version: 2.1.0
SystemML Version: 0.14.0-incubating-SNAPSHOT
SystemML Built-Time: 2017-03-03 07:33:40 UTC
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
.......
Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. Trace:
java.lang.NegativeArraySizeException
at py4j.Base64.decode(Base64.java:321)
at py4j.Protocol.getBytes(Protocol.java:173)
at py4j.Protocol.getObject(Protocol.java:294)
at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82)
at py4j.commands.CallCommand.execute(CallCommand.java:77)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)