You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Makoto Yui (Jira)" <ji...@apache.org> on 2021/05/06 07:32:00 UTC

[jira] [Comment Edited] (HIVEMALL-306) KryoException occurred when running hivemall in Spark SQL for matrix factorization on MovieLens 1M dataset

    [ https://issues.apache.org/jira/browse/HIVEMALL-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340049#comment-17340049 ] 

Makoto Yui edited comment on HIVEMALL-306 at 5/6/21, 7:31 AM:
--------------------------------------------------------------

spark 2.4.5 uses hive version 1.2.1.spark2.

[https://github.com/apache/spark/blob/v2.4.5/pom.xml#L129]
[https://github.com/apache/spark/blob/v2.4.5/pom.xml#L1466]

hive 1.2.1.spark2

[https://mvnrepository.com/artifact/org.spark-project.hive/hive/1.2.1.spark2]

uses 
|[com.esotericsoftware.kryo|https://mvnrepository.com/artifact/com.esotericsoftware.kryo] » [kryo|https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo]|[2.21|https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21]|

Kryo 2.21 has a bug in serializing Generic collections. 


was (Author: myui):
spark 2.4.5 uses hive version 1.2.1.spark2.

[https://github.com/apache/spark/blob/v2.4.5/pom.xml#L129
] [https://github.com/apache/spark/blob/v2.4.5/pom.xml#L1466]


hive 1.2.1.spark2

[https://mvnrepository.com/artifact/org.spark-project.hive/hive/1.2.1.spark2]

uses 
|[com.esotericsoftware.kryo|https://mvnrepository.com/artifact/com.esotericsoftware.kryo] » [kryo|https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo]|[2.21|https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21]|


Kryo 2.21 has a bug in serializing Generic collections. 

> KryoException occurred when running hivemall in Spark SQL for matrix factorization on MovieLens 1M dataset
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVEMALL-306
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-306
>             Project: Hivemall
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Bob
>            Assignee: Makoto Yui
>            Priority: Major
>         Attachments: image-2021-04-25-11-10-00-143.png, image-2021-04-25-11-11-09-950.png
>
>
> I am using Hivemall on spark, following the guidelines for movielens demo. The error occurred during 9.3.3 which is running matrix factorization on MovieLens 1M dataset.
> Please kindly help me on the problem, Thanks a lot.
> The SQL statement is same as the ones in the documents and here are the related errors:
> {code:java}
> //
> 21/04/22 15:14:07 INFO TaskSchedulerImpl: Cancelling stage 221/04/22 15:14:07 INFO TaskSchedulerImpl: Cancelling stage 221/04/22 15:14:07 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage cancelled21/04/22 15:14:07 INFO DAGScheduler: ShuffleMapStage 2 (sql at OfflineSqlTemplate.scala:30) failed in 0.306 s due to Job aborted due to stage failure: Task serialization failed: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.ArrayIndexOutOfBoundsException: 1Serialization trace:itemBias (hivemall.factorization.mf.FactorizedModel)model (hivemall.factorization.mf.MatrixFactorizationSGDUDTF)org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.ArrayIndexOutOfBoundsException: 1Serialization trace:itemBias (hivemall.factorization.mf.FactorizedModel)model (hivemall.factorization.mf.MatrixFactorizationSGDUDTF) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:570) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at org.apache.hive.com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:486) at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializeObjectByKryo(HiveShim.scala:166) at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:176) at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:189) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:479) at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:479) at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1140) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1155) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1071) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:1074) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:1073) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1073) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1014) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2069) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.setGenerics(MapSerializer.java:53) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:563) ... 87 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)