You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Denis Magda <dm...@gridgain.com> on 2015/11/02 10:00:08 UTC

Re: IP based discovery, dynamic nodes, and start() deadlock

Hi Erik,

Discovery logic works the same for any kind of IP finders.
IP finder is just a way to pass a list of TCP/IP addresses to use during the
time a node joins a cluster. 
Muticast IP finder forms such a list automatically for you while in case of
static IP finder you do this manually.

If you properly setup the static IP finder for two nodes and see that they
are blocked on RES_WAIT then please do the following:

- make sure that each node can reach each other over network. The ports that
the nodes are bound to might be closed by your firewall. In any case there
is always a way to set ports list to use for an IP finder:
https://apacheignite.readme.io/docs/cluster-config#multicast-and-static-ip-based-discovery

- if you're sure that there is no any network related issue and each node
can reach each other then please provide us with a reproducible example.
Probably this is a bug and we will be able to fix it.

In any case if you properly setup the static IP finder then you shouldn't
worry about any leader selection or any other discovery related
responsibilities. This is done out of the box.

Answering on some of your additional questions.

> Just to confirm, what would the Ignite topology be after the following
> sequences? 
> Node A starts (no peers) 
> Node B starts (no peers) 
> Node C starts, connects to A 
> Node D starts, connects to B 
> -- I assume here we will have two isolated clusters 

If A and C is from one network segment and B and D is from the other then
you'll have two clusters. If all the nodes from one network segment then
you'll have one cluster.

> Node E starts, connects to A, B 
> -- Do we now get a unified topology spanning all nodes? 

The same as above.

Hope this helps.

Regards,
Denis



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IP-based-discovery-dynamic-nodes-and-start-deadlock-tp1786p1805.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IP based discovery, dynamic nodes, and start() deadlock

Posted by Denis Magda <dm...@gridgain.com>.
Hi Erik,

Good point. I've added a warning callout to the end of this section 
http://apacheignite.gridgain.org/docs/cluster-config#static-ip-based-discovery

Is it clear for you from?

P.S.
In any case if you want to contribute you can find some interesting 
tasks in Ignite JIRA ;)

--
Denis

On 11/2/2015 10:13 PM, Erik Bunn wrote:
>
> Hullo Denis
>
> > Hope my answer made the things clearer for you, Denis
>
> Perfectly clear, thank you! So my attempt for economy with the 
> addresses misfired..
> Thanks for the additional notes as well, helps refine the code as I 
> test more.
>
> Is it possible to add short mention about this to the cluster-config / 
> Static IP Based Discovery chapter, perhaps just the Java sample 
> comments? Looks like I could just add a pull request, will look into it.
>
> Cheers,
> //eb


Re: IP based discovery, dynamic nodes, and start() deadlock

Posted by Erik Bunn <eb...@basen.net>.
Hullo Denis

> Hope my answer made the things clearer for you, Denis

Perfectly clear, thank you! So my attempt for economy with the addresses misfired..
Thanks for the additional notes as well, helps refine the code as I test more.

Is it possible to add short mention about this to the cluster-config / Static IP Based Discovery chapter, perhaps just the Java sample comments? Looks like I could just add a pull request, will look into it.

Cheers,
//eb




Re: IP based discovery, dynamic nodes, and start() deadlock

Posted by Denis Magda <dm...@gridgain.com>.
Hello Erik,

I got to the bottom of your issue, thanks for sharing the code.

The reason why you have the deadlock is because the static IP finders 
don't have a local node's address in their list.
If an IP finder has a local address in the list then the discovery will 
allow the node with this address to form a single-node cluster.

When you set the IP finder the way below the issue will disappear on 
your side.

TcpDiscoveryVmIpFinder ipFinder =new TcpDiscoveryVmIpFinder();

ArrayList<String> addresses =new ArrayList<>();
addresses.add("127.0.0.1:5000");
addresses.add("127.0.0.1:6000");


So as you see the IP finder has to contain addresses of all the nodes 
(including local node's one) that must form a cluster.


Next as I see you bind each node to a specific address. As you may know 
in addition to DiscoverySpi there is a CommunicationSpi that is heavily 
used by cluster nodes when one needs to reach the other directly.
You may want/need CommunicationSpi to bind to a specific address as well.
If this is your case I would recommend you to set the host part of the 
address through IgniteConfiguration

IgniteConfiguration icfg =new IgniteConfiguration();

icfg.setLocalHost("127.0.0.1");


Host address will be propagated to both discovery and communication SPIs.
After that you just need to change discovery and communication local 
ports to bind to if required.

TcpDiscoverySpi spi =new TcpDiscoverySpi();
spi.setLocalPort(6000);

TcpCommunicationSpi cSpi =new TcpCommunicationSpi();
cSpi.setLocalPort(48000);


Finally, answering on your question
>> Just to confirm, what would the Ignite topology be after the following
>> sequences?
>> Node A starts (no peers)
>> Node B starts (no peers)
>> Node C starts, connects to A
>> Node D starts, connects to B
>> -- I assume here we will have two isolated clusters
> If A and C is from one network segment and B and D is from the other then
> you'll have two clusters. If all the nodes from one network segment then
> you'll have one cluster.
> I omitted a bit of context there: the above sequences, with strictly static IP discovery. We shouldn't have any cross-talk yet in that scenario, right?

If IP finders of all the nodes (A, B, C and D) have addresses of each 
other then Ignite will assemble 4 nodes cluster. If IP finders of nodes 
A and B does NOT contain addresses of nodes C and D and vice verse then 
you have two clusters: A and B cluster, C and D cluster. Hope my answer 
made the things clearer for you, Denis
On 11/2/2015 2:10 PM, Erik Bunn wrote:
> Hello Denis
>
>
>
>
>
>
> On 02/11/15 10:00, "Denis Magda" <dm...@gridgain.com> wrote:
>
>> - make sure that each node can reach each other over network. The ports that
>> the nodes are bound to might be closed by your firewall. In any case there
>> is always a way to set ports list to use for an IP finder:
>> https://apacheignite.readme.io/docs/cluster-config#multicast-and-static-ip-based-discovery
> This is confirmed and not the issue here.
>
>> - if you're sure that there is no any network related issue and each node
>> can reach each other then please provide us with a reproducible example.
>> Probably this is a bug and we will be able to fix it.
> My trivial sample class is listed below. It does have log4j/guava/jcommander dependencies, so I've also put it up at https://github.com/ebudan/ignite-static-test
>
>> In any case if you properly setup the static IP finder then you shouldn't
>> worry about any leader selection or any other discovery related
>> responsibilities. This is done out of the box.
> Happy to hear that. I'm pushing hard for a completely programmatic setup, so maybe I have omitted an option that resolves the RES_WAIT deadlock. If so, maybe it will be apparent in the code snippet below. (The test commands below run on localhost, but I tested on separate servers as well.)
>
>>> Just to confirm, what would the Ignite topology be after the following
>>> sequences?
>>> Node A starts (no peers)
>>> Node B starts (no peers)
>>> Node C starts, connects to A
>>> Node D starts, connects to B
>>> -- I assume here we will have two isolated clusters
>> If A and C is from one network segment and B and D is from the other then
>> you'll have two clusters. If all the nodes from one network segment then
>> you'll have one cluster.
> I omitted a bit of context there: the above sequences, with strictly static IP discovery. We shouldn't have any cross-talk yet in that scenario, right?
>
> Thanks!
> //eb
>
>
> Sample code to reproduce (https://github.com/ebudan/ignite-static-test):
>
> package net.memecry.ihw;
>
> import java.util.ArrayList;
> import java.util.Collection;
> import java.util.List;
> import java.util.concurrent.atomic.AtomicInteger;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.Ignition;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.lang.IgniteCallable;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.log4j.LogManager;
> import org.apache.log4j.Logger;
> import com.beust.jcommander.JCommander;
> import com.beust.jcommander.Parameter;
> import com.beust.jcommander.ParameterException;
> import com.google.common.net.HostAndPort;
>
> /*
>   * A sample cli app to test embedded Ignite and static IP discovery for
>   * cluster setup, and to demonstrate cluster deadlock. (Ignite version 1.4.0,
> * also confirmed for 1.5.0-SNAPSHOT.)
>   *
>   * To run:
>   *     java -jar build/libs/HelloIgnite-all-1.0.jar -a ADDRESS:PORT -p ADDRESS:PORT,ADDRESS:PORT,... [--task]
>   *
>   * Sets a node up at the discovery address+port specified by option -a, and
>   * connects to some of the nodes running at the address+port specified by -p.
>   * Without further parameters, waits for connections; if --task is specified,
>   * launches a sample compute task from Ignite documentation.
>   *
>   * Ideally, to test with two nodes:
>   *
>   *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:5000 -p 127.0.0.1:6000
>   *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:6000 -p 127.0.0.1:5000 --task
>   *
>   * In practice, the above deadlocks while the nodes wait for each other's ready signal.
>   * In order to successfully start up, one of the nodes must be launched without peers:
>   *
>   *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:5000
>   *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:6000 -p 127.0.0.1:5000 --task
>   *
>   */
> public class Main {
>
>      static final Logger log = LogManager.getLogger( Main.class );
>
>      @Parameter( names = { "-t", "--task" }, description = "Perform sample task." )
>      private boolean m_task;
>
>      @Parameter( names = { "-a", "--addr" }, description = "Own IP:port address", required = true )
>      private String m_addr;
>
>      @Parameter( names = { "-p", "--peers" }, description = "Comma separated list of IP:port of Ignite peers. Will use given port for Ignite, port+1 for discovery.", variableArity = true )
>      private List<String> m_peers = new ArrayList<>();
>
>      static int s_count = 0;
>
>      AtomicInteger m_received = new AtomicInteger( 0 );
>
>      private Ignite m_ignite;
>
>      private void go() {
>
>      HostAndPort hp =
>      HostAndPort.fromString( m_addr ).withDefaultPort( 5000 ).requireBracketsForIPv6();
>
>      IgniteConfiguration icfg = new IgniteConfiguration();
>      TcpDiscoverySpi spi = new TcpDiscoverySpi();
>      spi.setLocalAddress( hp.getHostText() );
>      spi.setLocalPort( hp.getPort() );
>      if( m_peers.size() > 0 ) {
>          TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
>          ipFinder.setAddresses( m_peers );
>          spi.setIpFinder( ipFinder );
>          log.debug( "Searching for peers: " + m_peers );
>      }
>      icfg.setDiscoverySpi( spi );
>      log.debug( "Discovery on " + hp.getHostText() + ":" + hp.getPort() );
>          
>      log.debug( "Starting Ignite" );
>      m_ignite = Ignition.start(icfg);
>      log.debug( "Ignite started." );
>
>      if( m_task ) {
>          try {
>              Thread.sleep( 3000 );
>          } catch( InterruptedException e ) {
>          }
>          doSampleTask();
>      }
>
>      // Wait forever, in order to look at cluster establishment.
>      while( true ) {
>          try {
>              synchronized( this ) {
>                  wait();
>              }
>          } catch( InterruptedException e ) {
>              
>          }
>      }
>      }
>
>      // Sample task from Ignite tutorial.
>      private void doSampleTask() {
>      log.debug( "Launching a task." );
>      Collection<IgniteCallable<Integer>> calls = new ArrayList<>();
>      for (final String word : "Count characters using callable".split(" "))
>          calls.add( () -> {
>          log.debug( "Counting " + word.length() + " chars" );
>          return word.length();
>      });
>      Collection<Integer> res = m_ignite.compute().call(calls);
>      int sum = res.stream().mapToInt(Integer::intValue).sum();
>      log.debug( "Total number of characters is '" + sum + "'." );
>      }
>
>      
>      public static void main( String[] args ) {
>
>      Main app = new Main();
>      JCommander jc = new JCommander( app );
>      try {
>          jc.parse( args );
>          app.go();
>      } catch( ParameterException e ) {
>          jc.usage();
>          System.exit( 1 );
>      }
>      }
> }
>

Re: IP based discovery, dynamic nodes, and start() deadlock

Posted by Erik Bunn <eb...@basen.net>.
Hello Denis






On 02/11/15 10:00, "Denis Magda" <dm...@gridgain.com> wrote:

>- make sure that each node can reach each other over network. The ports that
>the nodes are bound to might be closed by your firewall. In any case there
>is always a way to set ports list to use for an IP finder:
>https://apacheignite.readme.io/docs/cluster-config#multicast-and-static-ip-based-discovery

This is confirmed and not the issue here. 

>- if you're sure that there is no any network related issue and each node
>can reach each other then please provide us with a reproducible example.
>Probably this is a bug and we will be able to fix it.

My trivial sample class is listed below. It does have log4j/guava/jcommander dependencies, so I've also put it up at https://github.com/ebudan/ignite-static-test

>In any case if you properly setup the static IP finder then you shouldn't
>worry about any leader selection or any other discovery related
>responsibilities. This is done out of the box.

Happy to hear that. I'm pushing hard for a completely programmatic setup, so maybe I have omitted an option that resolves the RES_WAIT deadlock. If so, maybe it will be apparent in the code snippet below. (The test commands below run on localhost, but I tested on separate servers as well.)

>>Just to confirm, what would the Ignite topology be after the following
>> sequences? 
>> Node A starts (no peers) 
>> Node B starts (no peers) 
>> Node C starts, connects to A 
>> Node D starts, connects to B 
>> -- I assume here we will have two isolated clusters 
>
>If A and C is from one network segment and B and D is from the other then
>you'll have two clusters. If all the nodes from one network segment then
>you'll have one cluster.

I omitted a bit of context there: the above sequences, with strictly static IP discovery. We shouldn't have any cross-talk yet in that scenario, right?

Thanks!
//eb


Sample code to reproduce (https://github.com/ebudan/ignite-static-test):

package net.memecry.ihw;

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.ignite.Ignite;
import org.apache.ignite.Ignition;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.lang.IgniteCallable;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import com.beust.jcommander.JCommander;
import com.beust.jcommander.Parameter;
import com.beust.jcommander.ParameterException;
import com.google.common.net.HostAndPort;

/*
 * A sample cli app to test embedded Ignite and static IP discovery for 
 * cluster setup, and to demonstrate cluster deadlock. (Ignite version 1.4.0, 
* also confirmed for 1.5.0-SNAPSHOT.)
 * 
 * To run:
 *     java -jar build/libs/HelloIgnite-all-1.0.jar -a ADDRESS:PORT -p ADDRESS:PORT,ADDRESS:PORT,... [--task] 
 * 
 * Sets a node up at the discovery address+port specified by option -a, and 
 * connects to some of the nodes running at the address+port specified by -p. 
 * Without further parameters, waits for connections; if --task is specified, 
 * launches a sample compute task from Ignite documentation. 
 * 
 * Ideally, to test with two nodes:
 * 
 *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:5000 -p 127.0.0.1:6000
 *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:6000 -p 127.0.0.1:5000 --task
 * 
 * In practice, the above deadlocks while the nodes wait for each other's ready signal.
 * In order to successfully start up, one of the nodes must be launched without peers:
 * 
 *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:5000
 *     java -jar build/libs/HelloIgnite-all-1.0.jar -a 127.0.0.1:6000 -p 127.0.0.1:5000 --task
 * 
 */
public class Main {

    static final Logger log = LogManager.getLogger( Main.class );

    @Parameter( names = { "-t", "--task" }, description = "Perform sample task." )
    private boolean m_task;

    @Parameter( names = { "-a", "--addr" }, description = "Own IP:port address", required = true )
    private String m_addr;

    @Parameter( names = { "-p", "--peers" }, description = "Comma separated list of IP:port of Ignite peers. Will use given port for Ignite, port+1 for discovery.", variableArity = true )
    private List<String> m_peers = new ArrayList<>();

    static int s_count = 0;

    AtomicInteger m_received = new AtomicInteger( 0 );

    private Ignite m_ignite;

    private void go() {

    HostAndPort hp =
    HostAndPort.fromString( m_addr ).withDefaultPort( 5000 ).requireBracketsForIPv6();

    IgniteConfiguration icfg = new IgniteConfiguration();
    TcpDiscoverySpi spi = new TcpDiscoverySpi();
    spi.setLocalAddress( hp.getHostText() );
    spi.setLocalPort( hp.getPort() );        
    if( m_peers.size() > 0 ) {
        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setAddresses( m_peers );
        spi.setIpFinder( ipFinder );
        log.debug( "Searching for peers: " + m_peers );
    }
    icfg.setDiscoverySpi( spi );
    log.debug( "Discovery on " + hp.getHostText() + ":" + hp.getPort() );
        
    log.debug( "Starting Ignite" );
    m_ignite = Ignition.start(icfg);
    log.debug( "Ignite started." );

    if( m_task ) {
        try {
            Thread.sleep( 3000 );
        } catch( InterruptedException e ) {
        }
        doSampleTask();
    }

    // Wait forever, in order to look at cluster establishment.
    while( true ) {
        try {
            synchronized( this ) {
                wait();
            }
        } catch( InterruptedException e ) {
            
        }
    }
    }

    // Sample task from Ignite tutorial.
    private void doSampleTask() {
    log.debug( "Launching a task." );
    Collection<IgniteCallable<Integer>> calls = new ArrayList<>();
    for (final String word : "Count characters using callable".split(" "))
        calls.add( () -> { 
        log.debug( "Counting " + word.length() + " chars" );
        return word.length();
    });
    Collection<Integer> res = m_ignite.compute().call(calls);
    int sum = res.stream().mapToInt(Integer::intValue).sum();       
    log.debug( "Total number of characters is '" + sum + "'." );
    }

    
    public static void main( String[] args ) {

    Main app = new Main();
    JCommander jc = new JCommander( app );
    try {
        jc.parse( args );
        app.go();
    } catch( ParameterException e ) {
        jc.usage();
        System.exit( 1 );
    }
    }
}