You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "praveen sripati (JIRA)" <ji...@apache.org> on 2012/05/05 03:47:48 UTC

[jira] [Created] (HAMA-569) Make Hama scalable as more processing is done

praveen sripati created HAMA-569:
------------------------------------

             Summary: Make Hama scalable as more processing is done
                 Key: HAMA-569
                 URL: https://issues.apache.org/jira/browse/HAMA-569
             Project: Hama
          Issue Type: Improvement
          Components: bsp core
    Affects Versions: 0.4.0, 0.5.0
            Reporter: praveen sripati
             Fix For: 0.6.0


Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.

For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-569) Make Hama scalable as more processing is done

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277708#comment-13277708 ] 

Edward J. Yoon commented on HAMA-569:
-------------------------------------

+1 Actually, many people around me questions about this feature with real-time processing.
                
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
>                 Key: HAMA-569
>                 URL: https://issues.apache.org/jira/browse/HAMA-569
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: praveen sripati
>             Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-569) Make Hama scalable as more processing is done

Posted by "praveen sripati (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279500#comment-13279500 ] 

praveen sripati commented on HAMA-569:
--------------------------------------

In case, where the processing of a certain event is independent of the earlier event then it should be theoretically possible (keeping aside the Hama limitations if any) to scale up/down the # of bsp nodes for processing. Whenever there is any dependency, there is a need to keep track of # of bsp nodes at any instant of time to know which bsp node is processing a certain event.

Couldn't find much literature on using BSP for real time or on scaling BSP up/down. The closest I could get is Adaptive Parallelism in the Bulk-Synchronous Parallel Model (quickly glanced through it)

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9686

Also, looks like cloudscale (http://www.cloudscale.com/index.php/technology/cloudscale-bsp) uses BSP which is scalable.
                
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
>                 Key: HAMA-569
>                 URL: https://issues.apache.org/jira/browse/HAMA-569
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: praveen sripati
>             Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-569) Make Hama scalable as more processing is done

Posted by "praveen sripati (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284432#comment-13284432 ] 

praveen sripati commented on HAMA-569:
--------------------------------------

Having a single master sending data to multiple bsp nodes will not scale. The single master will be the bottle neck.

Two approaches I can think of

- Have multiple masters. The client gets the list of masters and sends the messages in a round-robin fashion.

- As in the hdfs, the client can get the list of bsp nodes and send the messages directly to the bsp nodes avoiding the master.

In either case, the client should be notified of any changes.

Any thoughts.

                
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
>                 Key: HAMA-569
>                 URL: https://issues.apache.org/jira/browse/HAMA-569
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: praveen sripati
>             Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HAMA-569) Make Hama scalable as more processing is done

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-569:
--------------------------------

    Affects Version/s:     (was: 0.5.0)
                           (was: 0.4.0)
        Fix Version/s:     (was: 0.6.0)
    
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
>                 Key: HAMA-569
>                 URL: https://issues.apache.org/jira/browse/HAMA-569
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>            Reporter: praveen sripati
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HAMA-569) Make Hama scalable as more processing is done

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412833#comment-13412833 ] 

Thomas Jungblut commented on HAMA-569:
--------------------------------------

We can use the parallel sliding window technology so not all tasks need to run in parallel. 

https://code.google.com/p/graphchi/wiki/IntroductionToGraphChi

The idea is really smart, we should incorperate this idea into our framework.

                
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
>                 Key: HAMA-569
>                 URL: https://issues.apache.org/jira/browse/HAMA-569
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: praveen sripati
>             Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira