You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Trevor Grant (JIRA)" <ji...@apache.org> on 2017/03/05 01:11:32 UTC

[jira] [Created] (MAHOUT-1950) Unread Block Data in Spark Shell Pseudo Cluster

Trevor Grant created MAHOUT-1950:
------------------------------------

             Summary: Unread Block Data in Spark Shell Pseudo Cluster
                 Key: MAHOUT-1950
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1950
             Project: Mahout
          Issue Type: Bug
          Components: Mahout spark shell
    Affects Versions: 0.13.0
         Environment: Spark 1.6.3 Cluster / Pseudo Cluster / YARN Cluster (all observed)
            Reporter: Trevor Grant
            Assignee: Trevor Grant
            Priority: Blocker


When doing an operation in the Spark Shell on a Pseudo Cluster, a `java.lang.IllegalStateException: unread block data` error is thrown. 

Research and stack trace implies there is some issue with serialization.  Other issues with spark in cluster mode, hint that the Kryo Jars aren't being shipped around.

Toying has shown that:
`$SPARK_HOME/bin/spark-shell --jars "/opt/mahout/math-scala/target/mahout-math-scala_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/math/target/mahout-math-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT.jar,/opt/mahout/spark/target/mahout-spark_2.10-0.13.0-SNAPSHOT-dependency-reduced.jar" -i $MAHOUT_HOME/bin/load-shell.scala --conf spark.kryo.referenceTracking=false --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator --conf spark.kryoserializer.buffer=32k --conf spark.kryoserializer.buffer.max=600m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer`

works, and should be used in place of:
https://github.com/apache/mahout/blob/master/bin/mahout#L294



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)