You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "praveen sripati (JIRA)" <ji...@apache.org> on 2012/05/05 03:47:48 UTC
[jira] [Created] (HAMA-569) Make Hama scalable as more processing
is done
praveen sripati created HAMA-569:
------------------------------------
Summary: Make Hama scalable as more processing is done
Key: HAMA-569
URL: https://issues.apache.org/jira/browse/HAMA-569
Project: Hama
Issue Type: Improvement
Components: bsp core
Affects Versions: 0.4.0, 0.5.0
Reporter: praveen sripati
Fix For: 0.6.0
Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HAMA-569) Make Hama scalable as more processing
is done
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277708#comment-13277708 ]
Edward J. Yoon commented on HAMA-569:
-------------------------------------
+1 Actually, many people around me questions about this feature with real-time processing.
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
> Key: HAMA-569
> URL: https://issues.apache.org/jira/browse/HAMA-569
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Affects Versions: 0.4.0, 0.5.0
> Reporter: praveen sripati
> Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HAMA-569) Make Hama scalable as more processing
is done
Posted by "praveen sripati (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279500#comment-13279500 ]
praveen sripati commented on HAMA-569:
--------------------------------------
In case, where the processing of a certain event is independent of the earlier event then it should be theoretically possible (keeping aside the Hama limitations if any) to scale up/down the # of bsp nodes for processing. Whenever there is any dependency, there is a need to keep track of # of bsp nodes at any instant of time to know which bsp node is processing a certain event.
Couldn't find much literature on using BSP for real time or on scaling BSP up/down. The closest I could get is Adaptive Parallelism in the Bulk-Synchronous Parallel Model (quickly glanced through it)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9686
Also, looks like cloudscale (http://www.cloudscale.com/index.php/technology/cloudscale-bsp) uses BSP which is scalable.
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
> Key: HAMA-569
> URL: https://issues.apache.org/jira/browse/HAMA-569
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Affects Versions: 0.4.0, 0.5.0
> Reporter: praveen sripati
> Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HAMA-569) Make Hama scalable as more processing
is done
Posted by "praveen sripati (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284432#comment-13284432 ]
praveen sripati commented on HAMA-569:
--------------------------------------
Having a single master sending data to multiple bsp nodes will not scale. The single master will be the bottle neck.
Two approaches I can think of
- Have multiple masters. The client gets the list of masters and sends the messages in a round-robin fashion.
- As in the hdfs, the client can get the list of bsp nodes and send the messages directly to the bsp nodes avoiding the master.
In either case, the client should be notified of any changes.
Any thoughts.
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
> Key: HAMA-569
> URL: https://issues.apache.org/jira/browse/HAMA-569
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Affects Versions: 0.4.0, 0.5.0
> Reporter: praveen sripati
> Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HAMA-569) Make Hama scalable as more processing
is done
Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edward J. Yoon updated HAMA-569:
--------------------------------
Affects Version/s: (was: 0.5.0)
(was: 0.4.0)
Fix Version/s: (was: 0.6.0)
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
> Key: HAMA-569
> URL: https://issues.apache.org/jira/browse/HAMA-569
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Reporter: praveen sripati
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HAMA-569) Make Hama scalable as more processing
is done
Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HAMA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412833#comment-13412833 ]
Thomas Jungblut commented on HAMA-569:
--------------------------------------
We can use the parallel sliding window technology so not all tasks need to run in parallel.
https://code.google.com/p/graphchi/wiki/IntroductionToGraphChi
The idea is really smart, we should incorperate this idea into our framework.
> Make Hama scalable as more processing is done
> ---------------------------------------------
>
> Key: HAMA-569
> URL: https://issues.apache.org/jira/browse/HAMA-569
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Affects Versions: 0.4.0, 0.5.0
> Reporter: praveen sripati
> Fix For: 0.6.0
>
>
> Currently Hama doesn't scale. Once the job has been submitted, the # of the bsp tasks is fixed. So, there are fixed costs associated with the job. The JIRA is to evaluate if Hama can be made scalable automatically once the job has been submitted and provide a solution for the same. This applies to both batch and real time processing.
> For ex., in the case of real time processing the # of bsp tasks once the job has been submitted remain the same for 1 or a million inputs per second.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira