You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Thomas Jungblut (Created) (JIRA)" <ji...@apache.org> on 2011/10/19 21:09:10 UTC

[jira] [Created] (HAMA-458) Add Combiners

Add Combiners
-------------

                 Key: HAMA-458
                 URL: https://issues.apache.org/jira/browse/HAMA-458
             Project: Hama
          Issue Type: New Feature
            Reporter: Thomas Jungblut
             Fix For: 0.5.0


We have spoken about it on the mailing list:
http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html

Small extract:

{quote}
E.g: when should the combiner be called and in which form. In MapReduce they
are plain reducers, so we can have the same model but not grouped by a key
rather than the class name of the message.
{quote}

Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-458:
--------------------------------

          Component/s: bsp
    Affects Version/s: 0.3.0
        Fix Version/s: 0.4.0
    
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: combiner_v01.patch, combiner_v02.patch, combiner_v03.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon reassigned HAMA-458:
-----------------------------------

    Assignee: Edward J. Yoon
    
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>         Attachments: combiner_v01.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132330#comment-13132330 ] 

Edward J. Yoon commented on HAMA-458:
-------------------------------------

If possible, let's schedule to 0.4
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134835#comment-13134835 ] 

Edward J. Yoon commented on HAMA-458:
-------------------------------------

I'm taking this task.
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>         Attachments: combiner_v01.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-458) Add Combiners

Posted by "Thomas Jungblut (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134961#comment-13134961 ] 

Thomas Jungblut commented on HAMA-458:
--------------------------------------

Very nice. +1, but I didn' test. Just reviewed the patch.
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>         Attachments: combiner_v01.patch, combiner_v02.patch, combiner_v03.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon resolved HAMA-458.
---------------------------------

    Resolution: Fixed

Thanks for review. Builds OK, and test passed on my cluster.

I just commit this before make other changes.

                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: combiner_v01.patch, combiner_v02.patch, combiner_v03.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-458:
--------------------------------

    Attachment: combiner_v03.patch

Fix some bug when local sending.
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>         Attachments: combiner_v01.patch, combiner_v02.patch, combiner_v03.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-458:
--------------------------------

    Attachment: combiner_v01.patch

I didn't test this patch but, we can simply add combiner function like this.

Then, the communication can be minimized since only the summarized values are exchanged among the peers in each superstep.
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>         Attachments: combiner_v01.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HAMA-458) Add Combiners

Posted by "Thomas Jungblut (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Jungblut updated HAMA-458:
---------------------------------

      Description: 
We have spoken about it on the mailing list:
http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html

Small extract:

{quote}
When should the combiner be called and in which form. In MapReduce they
are plain reducers, so we can have the same model but not grouped by a key
rather than the class name of the message.
{quote}

Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
We need a smart idea. 

  was:
We have spoken about it on the mailing list:
http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html

Small extract:

{quote}
E.g: when should the combiner be called and in which form. In MapReduce they
are plain reducers, so we can have the same model but not grouped by a key
rather than the class name of the message.
{quote}

Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
We need a smart idea. 

    Fix Version/s:     (was: 0.5.0)
    
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-458:
--------------------------------

    Attachment: combiner_v02.patch

Here's my final patch.

Message combiner can be implemented as below:

{code}
  public static class SumCombiner extends Combiner {

    @Override
    public BSPMessageBundle combine(Iterable<BSPMessage> messages) {
      BSPMessageBundle bundle = new BSPMessageBundle();
      int sum = 0;

      Iterator<BSPMessage> it = messages.iterator();
      while (it.hasNext()) {
        sum += ((IntegerMessage) it.next()).getData();
      }

      bundle.addMessage(new IntegerMessage("Sum", sum));
      return bundle;
    }

  }
{code}

And test passed.
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>         Attachments: combiner_v01.patch, combiner_v02.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-458) Add Combiners

Posted by "Edward J. Yoon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135602#comment-13135602 ] 

Edward J. Yoon commented on HAMA-458:
-------------------------------------

{code}

11/10/26 09:52:42 FATAL bsp.BSPPeerImpl: Caught exception during superstep 0!
java.lang.RuntimeException: java.lang.InstantiationException
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:195)
        at org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:70)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:62)
        at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:946)
Caused by: java.lang.InstantiationException
        at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:30)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
        ... 4 more
{code}

Found a bug. I'm fixing directly.
                
> Add Combiners
> -------------
>
>                 Key: HAMA-458
>                 URL: https://issues.apache.org/jira/browse/HAMA-458
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Thomas Jungblut
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: combiner_v01.patch, combiner_v02.patch, combiner_v03.patch
>
>
> We have spoken about it on the mailing list:
> http://www.mail-archive.com/hama-dev@incubator.apache.org/msg05540.html
> Small extract:
> {quote}
> When should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
> {quote}
> Especially if we could collect all task data on the same host and combine them. This would yield into better performance.
> We need a smart idea. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira