You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Tracyl <tl...@bloomberg.net> on 2016/10/06 14:50:13 UTC

Fail to cache rdd: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

Hi team,

I was trying to cache dataframe in Ignite Cache. I was able to cache generic
type data elements(RDD<Int, Int>). However each time when I use
igniteRDDF.saveValues() to cache a non-generic data type(e.g. RDD<Row>), it
will trigger the noSuchMethod for saveValues as following shows. I am using
scala 2.10, Spark 1.6.1 and Ignite 1.6.0. I did find the ignite-spark was
pulling in spark-core_2.11. After excluding native spark from ignite-spark,
I still got the same error. Any suggestions? Thanks in advance!

Code sample:
igniteRDD.saveValues(df.rdd());

Exceptions I got:
Driver stacktrace:
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
        at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
        at org.apache.ignite.spark.IgniteRDD.saveValues(IgniteRDD.scala:138)
        at ignitecontext.IgniteRDDExample.run(IgniteRDDExample.java:81)
        at ignitecontext.IgniteRDDExample.main(IgniteRDDExample.java:35)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError:
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
        at
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151)
        at
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150)
        at
org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138)
        at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-cache-rdd-java-lang-NoSuchMethodError-scala-Predef-conforms-Lscala-Predef-less-colon-less-tp8116.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Fail to cache rdd: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

Posted by vkulichenko <va...@gmail.com>.
Hi Tracy,

Here are my answers:

1. Correct. IgniteContext is actually a wrapper around SparkContext which is
used to provide connectivity with the Ignite cluster.
2. Please refer to [1] for information about how to load the data into the
cache. You can implement your own CacheStore or use IgniteDataStreamer. In
any case, you should create an entry per row, having DataFrame as a value
doesn't make sense.
3. I think the answer is no. the only thing you need to create the
IgniteContext is SparkContext.

[1] https://apacheignite.readme.io/docs/data-loading

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-cache-rdd-java-lang-NoSuchMethodError-scala-Predef-conforms-Lscala-Predef-less-colon-less-tp8123p8245.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Fail to cache rdd: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

Posted by Tracyl <tl...@bloomberg.net>.
Hi Denis,

This is really helpful. Yes, I need the original dataframe for other API.

Now I am using RDD[String, Row] as type and caching dataframe using:
val rdd = df.map(row => (row.getAs[String]("KEY"), row))
igniteRDD.savePairs(rdd)

It works perfectly fine. Also I was able to reconstruct the dataframe using
values collection and schema of rdd.  

Tracy



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-cache-rdd-java-lang-NoSuchMethodError-scala-Predef-conforms-Lscala-Predef-less-colon-less-tp8123p8128.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Fail to cache rdd: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

Posted by Denis Magda <dm...@gridgain.com>.
Presently, 

You need to covert your data frame into key-value tuples and insert them in an IgniteRDD
https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-saving-values-to-ignite <https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-saving-values-to-ignite>

To read your data back you can leverage Ignite’s SQL engine that is much faster than Spark’s one
https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-running-sql-queries-against-ignite-cache <https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-running-sql-queries-against-ignite-cache>

Alternatively, you can retrieve data using Spark RDD based methods
https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-reading-values-from-ignite <https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-reading-values-from-ignite>

After you get your data back you can turn it into a DataFrame but I’m not sure that this transformation is really needed unless you process DataFrames later with some other API.

—
Denis

> On Oct 6, 2016, at 11:05 AM, Tracyl <tl...@bloomberg.net> wrote:
> 
> Thanks for prompt reply. So if I want to cache dataframe in IgniteCache, I
> have to do define a custom data model class(e.g.
> https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/model/Person.java
> ) as a schema of dataframe, then construct objects and declare cache to be
> data model type? In other words, I have to do DataFrame => Custom class
> object => IgniteRDD. And when I retrieve it, I have to do IgniteRDD =>
> Custom class object => DataFrame, right?
> 
> Do I have other options for caching dataframe?
> 
> 
> 
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-cache-rdd-java-lang-NoSuchMethodError-scala-Predef-conforms-Lscala-Predef-less-colon-less-tp8123p8126.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Fail to cache rdd: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

Posted by Tracyl <tl...@bloomberg.net>.
Thanks for prompt reply. So if I want to cache dataframe in IgniteCache, I
have to do define a custom data model class(e.g.
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/model/Person.java
) as a schema of dataframe, then construct objects and declare cache to be
data model type? In other words, I have to do DataFrame => Custom class
object => IgniteRDD. And when I retrieve it, I have to do IgniteRDD =>
Custom class object => DataFrame, right?

Do I have other options for caching dataframe?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-cache-rdd-java-lang-NoSuchMethodError-scala-Predef-conforms-Lscala-Predef-less-colon-less-tp8123p8126.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Fail to cache rdd: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

Posted by Denis Magda <dm...@gridgain.com>.
Hi,

Presently Ignite integration for Spark is not intended to be used with Spark Data Frames. There is the ticket you can track on that was created in order to fill this gap.
https://issues.apache.org/jira/browse/IGNITE-3084 <https://issues.apache.org/jira/browse/IGNITE-3084>

As for NoSuchMethodError, this may be related to the fact that Ignite doesn’t use an update-to-date version of Spark. There is a ticket for his as well
https://issues.apache.org/jira/browse/IGNITE-3822 <https://issues.apache.org/jira/browse/IGNITE-3822>

In general, the tickets should be resolved in Q1 2017 if no one else takes care of them earlier.

—
Denis

> On Oct 6, 2016, at 7:50 AM, Tracyl <tl...@bloomberg.net> wrote:
> 
> Hi team,
> 
> I was trying to cache dataframe in Ignite Cache. I was able to cache generic
> type data elements(RDD<Int, Int>). However each time when I use
> igniteRDDF.saveValues() to cache a non-generic data type(e.g. RDD<Row>), it
> will trigger the noSuchMethod for saveValues as following shows. I am using
> scala 2.10, Spark 1.6.1 and Ignite 1.6.0. I did find the ignite-spark was
> pulling in spark-core_2.11. After excluding native spark from ignite-spark,
> I still got the same error. Any suggestions? Thanks in advance!
> 
> Code sample:
> igniteRDD.saveValues(df.rdd());
> 
> Exceptions I got:
> Driver stacktrace:
>        at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>        at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>        at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>        at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at scala.Option.foreach(Option.scala:236)
>        at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>        at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>        at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>        at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>        at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>        at
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
>        at
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
>        at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
>        at org.apache.ignite.spark.IgniteRDD.saveValues(IgniteRDD.scala:138)
>        at ignitecontext.IgniteRDDExample.run(IgniteRDDExample.java:81)
>        at ignitecontext.IgniteRDDExample.main(IgniteRDDExample.java:35)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:498)
>        at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError:
> scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
>        at
> org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:151)
>        at
> org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1$$anonfun$apply$1.apply(IgniteRDD.scala:150)
>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>        at
> org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:150)
>        at
> org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:138)
>        at
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
>        at
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
>        at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>        at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>        at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
> 
> 
> 
> 
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Fail-to-cache-rdd-java-lang-NoSuchMethodError-scala-Predef-conforms-Lscala-Predef-less-colon-less-tp8116.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.