You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Avery Ching (Created) (JIRA)" <ji...@apache.org> on 2012/01/22 09:21:38 UTC

[jira] [Created] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

RPC port from BasicRPCCommunications should be only a starting port, and retried
--------------------------------------------------------------------------------

                 Key: GIRAPH-128
                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
             Project: Giraph
          Issue Type: Improvement
    Affects Versions: 0.1.0
            Reporter: Avery Ching
            Assignee: Avery Ching


Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Avery Ching (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-128:
-------------------------------

    Attachment: GIRAPH-128.3.patch

Corresponding update to rb diff r4
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191425#comment-13191425 ] 

jiraposter@reviews.apache.org commented on GIRAPH-128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
-----------------------------------------------------------

Review request for giraph.


Summary
-------

Simple handling of port collisions on the same machine while preserving debugability from the port number alone.  Round up the max number of workers to the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
    https://issues.apache.org/jira/browse/GIRAPH-128


Diffs
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1234970 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1234970 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1234970 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
-------

Passed local and MR unittests.


Thanks,

Avery


                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195389#comment-13195389 ] 

Hudson commented on GIRAPH-128:
-------------------------------

Integrated in Giraph-trunk-Commit #69 (See [https://builds.apache.org/job/Giraph-trunk-Commit/69/])
    GIRAPH-128: RPC port from BasicRPCCommunications should be only a
starting port, and retried. (aching)

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236971
Files : 
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/comm
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java

                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, GIRAPH-128.4.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Avery Ching (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-128:
-------------------------------

    Attachment: GIRAPH-128.2.patch

Updated after GIRAPH-124 was committed.
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195378#comment-13195378 ] 

jiraposter@reviews.apache.org commented on GIRAPH-128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
-----------------------------------------------------------

(Updated 2012-01-28 03:01:01.809862)


Review request for giraph.


Changes
-------

Addressed the mockito suggestion.


Summary
-------

Simple handling of port collisions on the same machine while preserving debugability from the port number alone.  Round up the max number of workers to the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
    https://issues.apache.org/jira/browse/GIRAPH-128


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1236935 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1236935 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
-------

Passed local and MR unittests.


Thanks,

Avery


                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Avery Ching (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching resolved GIRAPH-128.
--------------------------------

    Resolution: Fixed

Thanks for the review and comments Jakob.  It passed hudson, closing.
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, GIRAPH-128.4.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195275#comment-13195275 ] 

Jakob Homan commented on GIRAPH-128:
------------------------------------

* Should
{noformat}+    private static final int MAX_BIND_ATTEMPTS = 20;{noformat} be a configuration variable? It seems like something would be good to be able to tune.
* Choice of increment:
This is a review of GIRAPH-128.2.patch.  When uploading new versions of patches, it's best not delete the old ones so reviewers can see the progress. JIRA highlights the most current version (although it's always nice to number them as well, as was done here).

{noformat}
+        // Simple handling of port collisions on the same machine while
+        // preserving debugability from the port number alone.
+        // Round up the max number of workers to the next power of 10 and use
+        // it as a constant to increase the port number with.
+        int portIncrementConstant =
+            (int) Math.pow(10, Math.ceil(Math.log10(numWorkers)));
{noformat}
This seems like a bit of an odd choice; what motivated it?  Similar jobs (in a small cluster, for instance) may have the same number of workers and keep colliding.  Any reason not just to use some random number from a uniform distribution of say 100 and use that as the constant.  That may result in fewer collisions.
* {{MinimumIntCombiner.java}} and {{SimpleSumCombiner.java}} have unnecessary whitespace changes.
* RPCCommunicationsTest: Is there any reason not use a mock here rather than actually extending the Mapper class for test?  These one-off implementations for tests show up in "Find implementations" actions in IDE.

                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195330#comment-13195330 ] 

jiraposter@reviews.apache.org commented on GIRAPH-128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
-----------------------------------------------------------

(Updated 2012-01-28 01:15:26.114994)


Review request for giraph.


Changes
-------

Removed whitspace changes for MinimumIntCombiner.java and SimpleSumCombiner.java and made GiraphJob.MAX_RPC_PORT_BIND_ATTEMPTS configurable, but default to 20.


Summary
-------

Simple handling of port collisions on the same machine while preserving debugability from the port number alone.  Round up the max number of workers to the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
    https://issues.apache.org/jira/browse/GIRAPH-128


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1236935 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1236935 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
-------

Passed local and MR unittests.


Thanks,

Avery


                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193579#comment-13193579 ] 

Avery Ching commented on GIRAPH-128:
------------------------------------

Anyone want to review?  I think this will be very useful to get in before the release since it lets users run multiple Giraph jobs on the same cluster simultaneously a lot easier...
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192587#comment-13192587 ] 

jiraposter@reviews.apache.org commented on GIRAPH-128:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3596/
-----------------------------------------------------------

(Updated 2012-01-24 21:53:06.906563)


Review request for giraph.


Changes
-------

Updated after GIRAPH-124 was committed.


Summary
-------

Simple handling of port collisions on the same machine while preserving debugability from the port number alone.  Round up the max number of workers to the next power of 10 and use it as a constant to increase the port number with.

Added a unit test to ensure it is working correctly.

Fixed 2 minor warnings on
src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java

of removing 'import java.util.List'.


This addresses bug GIRAPH-128.
    https://issues.apache.org/jira/browse/GIRAPH-128


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1235026 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1235026 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1235026 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RPCCommunicationsTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3596/diff


Testing
-------

Passed local and MR unittests.


Thanks,

Avery


                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Avery Ching (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-128:
-------------------------------

    Attachment: GIRAPH-128.4.patch

Sorry, I missed the mocking question.  Fixed it here.
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, GIRAPH-128.4.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195347#comment-13195347 ] 

Jakob Homan commented on GIRAPH-128:
------------------------------------

Any reason the question about mocks/extending the class wasn't addressed?
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195286#comment-13195286 ] 

Avery Ching commented on GIRAPH-128:
------------------------------------

Thanks for taking a look.  I forgot to upload the original (rb only for that one), hence part 2. 

The main motivation for the obscure case is that it would make debugging simpler.  We often see errors like serverX:portY, and can use portY to figure out which mapper to look at.  For example, currently the default starts at 30000.  If I see an error from 30001, then I know to go to mapper 1 to see it's problem.  And so on and so forth.  If I am running a 900 mapper job then if it's 31001 or 32001 then I still know to look at mapper partition 1.  If instead I had a 100 as the constant, then if it's 30101, I have to check both mapper 1 and mapper 101.  With up to 20 retries per port, we can handle at least 20 simultaneous jobs running on a single machine that have the same mapper partition id.  First of, that is probably unlikely.  But even if it does happen, 20 is probably more than an one machine would handle.  By the way, port retries are very fast (so I wouldn't worry to much about collisions).

Let me resubmit without the whitespace changes and making MAX_BIND_ATTEMPTS configurable.
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195382#comment-13195382 ] 

Jakob Homan commented on GIRAPH-128:
------------------------------------

Great, thanks.  +1.
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, GIRAPH-128.4.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict).  At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name).  I will be proposing a simple scheme to retry with another port.  I will round the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported.  It should be sufficient for most clusters.  At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira