You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by what0124 <j....@gmail.com> on 2017/01/13 13:48:04 UTC

Running Spark app in cluster!

Hello, 

I have been trying to run a simple Spark app under CDH with Yarn, I set
IGNITE_HOME path on all nodes, built app with jars needed and nodes seem to
discover each other but when using shared RDDs in shared deployment I can't
seem to retrieve all partitions. And most likely the partitions that I
retrieve from cache reside locally. Any suggestions? Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

I would suggest to create IgniteConfiguration programmatically (use
IgniteContext constructor that accepts closure instead of XML file path).

However, it looks like there is a room to improve, I created a ticket:
https://issues.apache.org/jira/browse/IGNITE-4593

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10198.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by what0124 <j....@gmail.com>.

Possibly, any ideas on how to make sure clients or drivers,workers,etc reuse
the same configuration??



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10195.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

The path to configuration file you provided will be used on all nodes, driver
and workers. So the actual file should be replicated and it looks like you
have different version of it on different Spark nodes. Can this be the case?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10166.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by what0124 <j....@gmail.com>.

Ok so i have 4 standalone nodes (servers with default example-cache.xml)  and
a client on which I submit job through spark/yarn using the
example-cache.xml but with ClientMode set to true. How to I make sure that
the workers that are started have that same configuration? Do I add the IPs
to the configuration?? 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10145.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

Are you sure the configuration change is properly applied on Spark worker
nodes? It must start Ignite nodes in client mode when you set this property.
The one client you have is most likely the one running on the driver.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10144.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by what0124 <j....@gmail.com>.

thanks for responding. I tried that and my topology is still adding servers
and I get incorrect results, I also changed backups to 0 in config file and
nothing seems to work, i dont think it should be this hard I don't know what
I'm overlooking.

Server nodes I start, when I run a spark submit job (RDDProducer) they add
servers with 1 client, then no client and it decreases the number of servers
<http://apache-ignite-users.70518.x6.nabble.com/file/n10141/nodes.png> 





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10141.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

Try to add this in the configuration file to force client mode for nodes
started within Spark:

<property name="clientMode" value="true"/>

Make sure not to do this for your standalone server nodes, of course.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10122.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by what0124 <j....@gmail.com>.

Yes, I pass the example xml, like this: 

object RDDProducer extends App {
  val conf = new SparkConf().setAppName("SparkIgnitePro")
  val sc = new SparkContext(conf)
  val ic = new IgniteContext(sc,"../examples/config/example-cache.xml")
  val sharedRDD= ic.fromCache[Integer,Integer]("a")
  val data= Array (1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9,0)
  sharedRDD.savePairs(sc.parallelize(data, 10).map(i=> (i, 1)))
  val testRDD= ic.fromCache[Integer, Integer]("test")
  println("First COUNT is:::::::"+testRDD.count())
  
}

object RDDConsumer extends App {
  val conf = new SparkConf().setAppName("SparkIgniteCon")
  val sc = new SparkContext(conf)
  val ic = new IgniteContext(sc,"../examples/config/example-cache.xml")
  val sharedRDD = ic.fromCache[Integer, Integer]("test")
  println("The count is:::::::::::: "+sharedRDD.count())
}




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10117.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

Hi,

Can you show how you create the IgniteContext? Are you using XML or creating
IgniteConfiguration in code?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10103.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by what0124 <j....@gmail.com>.

Actually yes, I start with 4 servers then once I submit the job it detects 1
client and adds more servers...is this behavior expected? Should I just have
the data on all my nodes, replicated?? 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10084.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

Are you sure nodes are discovering each other and there are no topology
changes in the middle? Sounds like you're sporadically losing some data
which can happen when you lose too many nodes at a time. Can you try to
change the cache mode to REPLICATED or increase number of backups?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10080.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by what0124 <j....@gmail.com>.

Sure! I'm trying to set up spark and ignite under CDH for shared deployment
and run this example https://github.com/knoldus/spark-ignite 

What I have done:
1. Download Ignite binaries and set IGNITE_HOME
2. Added library dependencies (ignite-core, ignite-spark and ignite-spring)
and built it using sbt assembly
3. Added JARs to Spark classpath (spark-env.sh)
4. Started Ignite nodes using ./bin/ignite.sh (except master)
4. Submitted Spark job (spark-submit --master yarn --deploy-mode
cluster.....etc)

It successfully creates 1024 partitions and creates cache but when
retrieving RDD the results are not consistent for example:

//producer
...
 val data= Array (1,2,3,4,5,6,7,8,9,10)
 sharedRDD.savePairs(sc.parallelize(data, 10).map(i=> (i, 1)))

//consumer
...
val sharedRDD = ic.fromCache[Integer, Integer]("partitioned")
println("The count is:::::::::::: "+sharedRDD.count())

The count at times is 4, other times 10, etc...I don't know if it is some
configuration setting that I'm missing in Cloudera or if there are some
locks needed when reading and writing to cache. Any suggestions would be
appreciated. Thanks!


 




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10079.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Running Spark app in cluster!

Posted by vkulichenko <va...@gmail.com>.

Hi,

Can you provide more details? How deployment looks like, what you're doing,
what is result and why it's not what you expect, etc...

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-Spark-app-in-cluster-tp10073p10078.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.