You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Ranjit Sahu <ra...@gmail.com> on 2018/01/17 12:23:43 UTC

Discovery node port range

Hi,

We are using ignite key value store in spark embeded mode. As the resource
is managed by yarn, sometimes when we request for 30 executor nodes, 20
nodes may get into one server. Which means we are starting 20 ignite server
nodes in one server. Currently we have set the discovery port range as 10,
should this be changed to 20 or more ?

The spi commn port range is set to 100.

Any other thoughts ?

Thanks,
Ranjit

Re: Discovery node port range

Posted by vkulichenko <va...@gmail.com>.

By standalone cluster I just mean a regular Ignite cluster running
independently from Spark. The easiest way is to start a node is using
ignite.sh script providing proper configuration file.

Once you switch IgniteContext to standalone mode, all nodes started within
Spark processes will run in client mode and will only be used to access the
cluster. All the data will be on server nodes, so Spark lifecycle will never
cause rebalancing or data loss.

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Discovery node port range

Posted by Ranjit Sahu <ra...@gmail.com>.

What is the best way to set up the stand alone ignite cluster for spark. I
think with standalone mode we need to deploy ignite separately on each
worker node. Can you send me some reference which i can look at. If suppose
we decide to go with Stand-alone can i still load data from my spark app
may be as a client ?
On Thu, 1 Feb 2018 at 10:16 PM, vkulichenko <va...@gmail.com>
wrote:

> Ranjit,
>
> Generally, removing and adding nodes in unpredictable way (which happens in
> embedded mode because we basically rely on Spark here) is a very bad anti
> pattern when working with distributed data. It can have serous performance
> implications as well as data loss.
>
> Data nodes are supposed to be relatively stable, so having standalone
> Ignite
> cluster is a correct way to architecture this.
>
> -Val
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Discovery node port range

Posted by vkulichenko <va...@gmail.com>.

Ranjit,

Generally, removing and adding nodes in unpredictable way (which happens in
embedded mode because we basically rely on Spark here) is a very bad anti
pattern when working with distributed data. It can have serous performance
implications as well as data loss.

Data nodes are supposed to be relatively stable, so having standalone Ignite
cluster is a correct way to architecture this.

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Discovery node port range

Posted by Ranjit Sahu <ra...@gmail.com>.

Hi Val,

What are the issue you guys have discovered with embededmode deploy ? If we
lost nodes in spark, the back up\replication should take care of those is
what i am thinking.
There is a overhead of rebalancing but how much is that compared to having
a stand alone cluster which we may not always?

Thanks,
Ranjit

On Wed, Jan 24, 2018 at 1:55 AM, vkulichenko <va...@gmail.com>
wrote:

> Ranjit,
>
> Then it sounds like you're recreating embedded mode on your own :) In this
> case deprecation will not affect you of course, but this still a NOT
> recommended way to use Ignite with Spark.
>
> -Val
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Discovery node port range

Posted by vkulichenko <va...@gmail.com>.

Ranjit,

Then it sounds like you're recreating embedded mode on your own :) In this
case deprecation will not affect you of course, but this still a NOT
recommended way to use Ignite with Spark.

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Discovery node port range

Posted by Ranjit Sahu <ra...@gmail.com>.

Thanks Val. We are not using the RDD of ignite instead we are building the
ignite cluster with in spark with custom code and use the key value store.
We are using the static ip discovery while doing so.

What we do is we read the data in avro from hadoop. Start the ignite node
first in driver, and use the driver ip along with few executors ip to use
for node discovery. Once the clister topology is built load the data to
cache and use sql interface to look up.

Do you think we will be impacted with the 2.4 deprecation dcsn ?

Thanks,
Ranjit
On Wed, 17 Jan 2018 at 11:29 PM, vkulichenko <va...@gmail.com>
wrote:

> Ranjit,
>
> Embedded mode in Spark RDD will be deprecated in 2.4 which is about to be
> released. My recommendation would be to use standalone mode instead.
>
> -Val
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Discovery node port range

Posted by vkulichenko <va...@gmail.com>.

Ranjit,

Embedded mode in Spark RDD will be deprecated in 2.4 which is about to be
released. My recommendation would be to use standalone mode instead.

-Val



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Discovery node port range

Posted by "ilya.kasnacheev" <il...@gmail.com>.

Hello Ranjit!

Running 20 nodes one one server is wasteful. I expect you would need to have
ports range of 20 to run 20 nodes.
This of course changes if each node runs under its own IP address (as it
happens with containers).

Also, client nodes don't need to bind to a fixed port. Any chance you could
make some of them clients?

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/