You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by "Franciscus, Naden" <Na...@team.telstra.com> on 2016/05/02 08:32:35 UTC

Spark/YARN

Hey guys,

So it looks like using Spark/Ignite on YARN together simply doesn't work.

Many of us have Hadoop appliances where we aren't allowed to install anything on the nodes. So the only option is YARN which
barring a few bugs seems to work okay. But the IgniteContext within Spark doesn't allow you to read configuration files from YARN.

So since you allow users to pass in an IgniteConfiguration we have tried to manually set configuration on the POJOs:
https://github.com/apache/ignite/blob/master/modules/spark/src/main/scala/org/apache/ignite/spark/IgniteContext.scala

But during any Spark distributed operation it will attempt to serialise this which is not possible since most of the classes contained
within IgniteConfiguration e.g. TcpDiscoverySpi are not serializable.

I am going to go through and see how many classes will need to be marked serializable (could be dozens) but a call will need to be made:

 1.  Mark everything within IgniteConfiguration as serializable.
 2.  Force ALL users of IgniteContext to either read config from HDFS or from a Local Filesystem. Both will go through Spring layer.

What's the best way to get a decision on this ?

Cheers,
Naden

Re: Spark/YARN

Posted by vijayendra bhati <ve...@yahoo.com.INVALID>.

Hi Naden,I am just wondering why you want to start ignite node in Yarn.You can stary the independent ignite node.What benefit you are getting by making part of yarn cluster.I am asking this because i too had the same queries but latet on i switched to starting independent ignite node on the same nodes where my node managers are running.I guess this too will give data locality etc.Also you can create RDD and save data directly to cache using RDD,you dont need to explicitly invoke ignite() or start().I can share the code if you want.
Regards,Vij

Sent from Yahoo Mail on Android 

  On Wed, 4 May, 2016 at 5:19 am, Franciscus, Naden<Na...@team.telstra.com> wrote:   
Hey Alexey,

Sorry I meant HDFS :)

It would be good for IgniteContext to have the option to read
configuration from HDFS.
Again many users of Hadoop aren't going to have the ability to modify data
nodes to install
Ignite. But I will raise that as a separate issue.

The big issue is that there is a lot of code like this in documentation
and elsewhere:
ignite.cache("myCache").put(personId, person);

Which we can't use in any RDD/DF map, filter etc because neither
Ignite/IgniteCache are serializable.

Cheers,
Naden

On 4/05/2016, 5:42 AM, "Alexey Goncharuk" <al...@gmail.com>
wrote:

>Hi Naden,
>
> But the IgniteContext within Spark doesn't allow you to read
>configuration
>> files from YARN.
>>
>
>I am a little bit confused. Ignite can be configured via basic Spring XML,
>and you can definitely read those XML files off HDFS or any other source.
>Is there any reason why XML does not work for you?
>
>Now it is impossible to make IgniteConfiguration serializable because it
>contains such components as SPIs, which are non-serializable by it's
>nature.

Re: Spark/YARN

Posted by "Franciscus, Naden" <Na...@team.telstra.com>.

Hey Alexey,

Sorry I meant HDFS :)

It would be good for IgniteContext to have the option to read
configuration from HDFS.
Again many users of Hadoop aren't going to have the ability to modify data
nodes to install
Ignite. But I will raise that as a separate issue.

The big issue is that there is a lot of code like this in documentation
and elsewhere:
ignite.cache("myCache").put(personId, person);

Which we can't use in any RDD/DF map, filter etc because neither
Ignite/IgniteCache are serializable.

Cheers,
Naden

On 4/05/2016, 5:42 AM, "Alexey Goncharuk" <al...@gmail.com>
wrote:

>Hi Naden,
>
> But the IgniteContext within Spark doesn't allow you to read
>configuration
>> files from YARN.
>>
>
>I am a little bit confused. Ignite can be configured via basic Spring XML,
>and you can definitely read those XML files off HDFS or any other source.
>Is there any reason why XML does not work for you?
>
>Now it is impossible to make IgniteConfiguration serializable because it
>contains such components as SPIs, which are non-serializable by it's
>nature.

Re: Spark/YARN

Posted by Alexey Goncharuk <al...@gmail.com>.

Hi Naden,

 But the IgniteContext within Spark doesn't allow you to read configuration
> files from YARN.
>

I am a little bit confused. Ignite can be configured via basic Spring XML,
and you can definitely read those XML files off HDFS or any other source.
Is there any reason why XML does not work for you?

Now it is impossible to make IgniteConfiguration serializable because it
contains such components as SPIs, which are non-serializable by it's nature.

Re: Spark/YARN

Posted by "Franciscus, Naden" <Na...@team.telstra.com>.

Just as a follow up to this:

IgniteConfiguration has properties e.g. CountdownLatch which aren't serializable and HDFS Path is not serializable either so there's no clean way to
construct a IgniteContext without breaking backwards compatibility.

Maybe update this page:
https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd

To mention that you can't use an IgniteContext within any Spark map, filter etc. So code such as this e.g. Using BinaryObjects will not work.

val pairRdd = rdd.map(x => {
  val builder = igniteConext.ignite.binary.builder("DT1")
  builder.setField("id", x.toString)
  builder.setField("name", "test-" + x.toString)
   val binObj = builder.build
  binObj
}).zipWithIndex.map(r => (r._2, r._1))

Shame that Ignite doesn’t really work on Spark. Hopefully one day !


From: Naden Franciscus <na...@team.telstra.com>>
Date: Monday, 2 May 2016 at 4:32 PM
To: "dev@ignite.apache.org<ma...@ignite.apache.org>" <de...@ignite.apache.org>>
Subject: Spark/YARN


Hey guys,

So it looks like using Spark/Ignite on YARN together simply doesn't work.

Many of us have Hadoop appliances where we aren't allowed to install anything on the nodes. So the only option is YARN which
barring a few bugs seems to work okay. But the IgniteContext within Spark doesn't allow you to read configuration files from YARN.

So since you allow users to pass in an IgniteConfiguration we have tried to manually set configuration on the POJOs:
https://github.com/apache/ignite/blob/master/modules/spark/src/main/scala/org/apache/ignite/spark/IgniteContext.scala

But during any Spark distributed operation it will attempt to serialise this which is not possible since most of the classes contained
within IgniteConfiguration e.g. TcpDiscoverySpi are not serializable.

I am going to go through and see how many classes will need to be marked serializable (could be dozens) but a call will need to be made:

 1.  Mark everything within IgniteConfiguration as serializable.
 2.  Force ALL users of IgniteContext to either read config from HDFS or from a Local Filesystem. Both will go through Spring layer.

What's the best way to get a decision on this ?

Cheers,
Naden