You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by Thomas Jungblut <th...@googlemail.com> on 2011/10/14 15:54:10 UTC

Checkpointer Process

Hi all.
My idea:
Since YARN and multitasking we should consider moving the Checkpointer
process into the BSPPeer itself instead of a single process.

It would be great if we could discuss what would be the real advantage and
disadvantage of integrating it in the same process / a daemon process.

-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Checkpointer Process

Posted by ChiaHung Lin <ch...@nuk.edu.tw>.
Personally I will go for option 2 or 3, but there is no problem if we decide to choose alternative. 

Some other thoughts I want to address because we may still face those issues in the future:

1. Fault may come from somewhere out of expectation. 
Previously some mapreduce users encountered process exits because of jvm bugs triggered by some specific class/ method. This problem is not something we can programme to cope with beforehand. And I think it is not our problem, but something e.g. software complexity, etc. The fault isolation in such case can help prevent corrupting process that functions well.   

2. Recovery time may be an overhead.
Ideally the recovery phase would only require the latest checkpointed data; however, it would be worth to check how long it takes to recover because vast amount of checkpointed data may take longer time to recover. Also, if failure cases increase, this presumably would incur the overhead for master server in dealing with works such as rescheduling/ repairing tasks, which further degrades system performance. 

In addition, it might be good if we can find a way to overcome yarn's restriction in the future because I think we are building project Hama. : ) 

At the moment I would like focus more on I/O issue. So for integrating bsp task with checkpoint, please feel free to work on it or I will take it later on. 

-----Original message-----
From:Thomas Jungblut <th...@googlemail.com>
To:hama-dev@incubator.apache.org
Date:Tue, 25 Oct 2011 20:40:38 +0200
Subject:Re: Checkpointer Process

Thank you very much for the clarification ChiaHung.
Since point 1 is the solution with less impact on YARN and our system, I'm
+1 too.

The problem for the first one is if the checkpoining process fails, user
> tasks may fail as well, which is an unwanted behaviour for users.


Yes you are correct. But I think if we can design it very robust (don't know
if this word exists in english) the user task should not fail at all.
For example, we are checkpointing but the local datanode is not available.
This causes huge amounts of wait time until the node is available. If it is
never available, the checkpointing will fail and the user task will fail
too.
But actually, datanode failure isn't our problem so it may be a bad example.
Anyways, we can make it that much robust that the task wouldn't fail even if
checkpointing isn't possible.
For example we could put a try/catch statement arround it. Unless we make
any bad errors like memory leaks, this should really be possible.

P.S: your solution with a single process which got all the data through a
socket was quite elegant.
But in YARN we can't allocate an additional container on every node where a
container has spawned.
Since we don't want to diverge the two implementations (especially in the
BSPPeer) we should focus on the solution with the least impact for the user
and less implementational work.

In Part1 we just have to remove the additional process and integrate the
code into the BSPPeer. This shouldn't be any more work than 5-8h's,
including testing with YARN and normal-HAMA.
So I think we should put this into 0.4.0.

If you want, I can help you, or even do the task.

2011/10/25 Edward J. Yoon <ed...@apache.org>
>
> P.S., I know this task is not easy. Should we re-scheduling this to 0.5
release?
>
> On Tue, Oct 25, 2011 at 5:22 PM, Edward J. Yoon <ed...@apache.org>
wrote:
> >> 1.) Checkpointer runs on the same process with bsp task.
> >> 2.) A separated checkpointing process per bsp task on each machine.
> >> 3.) A separated checkopinting process per machine.
> >> 4.) Checkpointing processes in forms of server farm.
> >
> > When some task fails, the whole tasks will be re-started with previous
> > checkpoint data. Right?
> >
> > I'm +1 for the first idea. I believe this way is simple and reliable.
> >
> > 2011/10/25 ChiaHung Lin <ch...@nuk.edu.tw>:
> >> Just some thoughts on why to programme checkpointer as separated
process. The idea is centered on isolation. Because fault will occur,
ensuring that failures/ errors would not adversely affect other parts of the
system becomes critical. Also, performing user tasks and saving data to hdfs
are two different issues so our goal is to ensure user tasks would
continuously work even if checkpointing process fails. As long as user tasks
keep continuously performing their job smoothly, checkpointing process can
be ignored.
> >>
> >> There were 4 options considered previously:
> >>
> >> 1.) Checkpointer runs on the same process with bsp task.
> >> 2.) A separated checkpointing process per bsp task on each machine.
> >> 3.) A separated checkopinting process per machine.
> >> 4.) Checkpointing processes in forms of server farm.
> >>
> >> The problem for the first one is if the checkpoining process fails,
user tasks may fail as well, which is an unwanted behaviour for users. The
fourth has a problem that it affects arbitrary user tasks for recovery if
both processes fail. The second and third is similar except that the second
option would min user tasks to be affected if both processes fail. Running
checkpointer as separated process has an advantage that if only
checkpointing process fails, it is not necessary to recover. For example,
suppose a BSP job performs its tasks from supersteps 1 to 10. At the same
time a separated checkpointing process stands by. In the first 3 supersteps,
both processes work well. After the supersteps 4, the checkpointing process
fails, but the user task is continuously doing it task. At the supersteps 7,
the checkpointer is back (e.g. restart). And if user task keeps working
until it finishes, there is no need to perform recovery in this case. If bsp
task fails after checkpointing process is back, the system has chances to
recover from the latest snapshot.
> >>
> >> I understand the current implementation is not perfect. But that would
be good if we can work toward this direction because these are recommended
to the best of my knowledge.
> >>
> >> -----Original message-----
> >> From:Thomas Jungblut <th...@googlemail.com>
> >> To:hama-dev@incubator.apache.org
> >> Date:Fri, 14 Oct 2011 15:54:10 +0200
> >> Subject:Checkpointer Process
> >>
> >> Hi all.
> >> My idea:
> >> Since YARN and multitasking we should consider moving the Checkpointer
> >> process into the BSPPeer itself instead of a single process.
> >>
> >> It would be great if we could discuss what would be the real advantage
and
> >> disadvantage of integrating it in the same process / a daemon process.
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin <th...@gmail.com>
> >>
> >>
> >> --
> >> ChiaHung Lin
> >> Department of Information Management
> >> National University of Kaohsiung
> >> Taiwan
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



--
Thomas Jungblut
Berlin


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Re: Checkpointer Process

Posted by Thomas Jungblut <th...@googlemail.com>.
Thank you very much for the clarification ChiaHung.
Since point 1 is the solution with less impact on YARN and our system, I'm
+1 too.

The problem for the first one is if the checkpoining process fails, user
> tasks may fail as well, which is an unwanted behaviour for users.


Yes you are correct. But I think if we can design it very robust (don't know
if this word exists in english) the user task should not fail at all.
For example, we are checkpointing but the local datanode is not available.
This causes huge amounts of wait time until the node is available. If it is
never available, the checkpointing will fail and the user task will fail
too.
But actually, datanode failure isn't our problem so it may be a bad example.
Anyways, we can make it that much robust that the task wouldn't fail even if
checkpointing isn't possible.
For example we could put a try/catch statement arround it. Unless we make
any bad errors like memory leaks, this should really be possible.

P.S: your solution with a single process which got all the data through a
socket was quite elegant.
But in YARN we can't allocate an additional container on every node where a
container has spawned.
Since we don't want to diverge the two implementations (especially in the
BSPPeer) we should focus on the solution with the least impact for the user
and less implementational work.

In Part1 we just have to remove the additional process and integrate the
code into the BSPPeer. This shouldn't be any more work than 5-8h's,
including testing with YARN and normal-HAMA.
So I think we should put this into 0.4.0.

If you want, I can help you, or even do the task.

2011/10/25 Edward J. Yoon <ed...@apache.org>
>
> P.S., I know this task is not easy. Should we re-scheduling this to 0.5
release?
>
> On Tue, Oct 25, 2011 at 5:22 PM, Edward J. Yoon <ed...@apache.org>
wrote:
> >> 1.) Checkpointer runs on the same process with bsp task.
> >> 2.) A separated checkpointing process per bsp task on each machine.
> >> 3.) A separated checkopinting process per machine.
> >> 4.) Checkpointing processes in forms of server farm.
> >
> > When some task fails, the whole tasks will be re-started with previous
> > checkpoint data. Right?
> >
> > I'm +1 for the first idea. I believe this way is simple and reliable.
> >
> > 2011/10/25 ChiaHung Lin <ch...@nuk.edu.tw>:
> >> Just some thoughts on why to programme checkpointer as separated
process. The idea is centered on isolation. Because fault will occur,
ensuring that failures/ errors would not adversely affect other parts of the
system becomes critical. Also, performing user tasks and saving data to hdfs
are two different issues so our goal is to ensure user tasks would
continuously work even if checkpointing process fails. As long as user tasks
keep continuously performing their job smoothly, checkpointing process can
be ignored.
> >>
> >> There were 4 options considered previously:
> >>
> >> 1.) Checkpointer runs on the same process with bsp task.
> >> 2.) A separated checkpointing process per bsp task on each machine.
> >> 3.) A separated checkopinting process per machine.
> >> 4.) Checkpointing processes in forms of server farm.
> >>
> >> The problem for the first one is if the checkpoining process fails,
user tasks may fail as well, which is an unwanted behaviour for users. The
fourth has a problem that it affects arbitrary user tasks for recovery if
both processes fail. The second and third is similar except that the second
option would min user tasks to be affected if both processes fail. Running
checkpointer as separated process has an advantage that if only
checkpointing process fails, it is not necessary to recover. For example,
suppose a BSP job performs its tasks from supersteps 1 to 10. At the same
time a separated checkpointing process stands by. In the first 3 supersteps,
both processes work well. After the supersteps 4, the checkpointing process
fails, but the user task is continuously doing it task. At the supersteps 7,
the checkpointer is back (e.g. restart). And if user task keeps working
until it finishes, there is no need to perform recovery in this case. If bsp
task fails after checkpointing process is back, the system has chances to
recover from the latest snapshot.
> >>
> >> I understand the current implementation is not perfect. But that would
be good if we can work toward this direction because these are recommended
to the best of my knowledge.
> >>
> >> -----Original message-----
> >> From:Thomas Jungblut <th...@googlemail.com>
> >> To:hama-dev@incubator.apache.org
> >> Date:Fri, 14 Oct 2011 15:54:10 +0200
> >> Subject:Checkpointer Process
> >>
> >> Hi all.
> >> My idea:
> >> Since YARN and multitasking we should consider moving the Checkpointer
> >> process into the BSPPeer itself instead of a single process.
> >>
> >> It would be great if we could discuss what would be the real advantage
and
> >> disadvantage of integrating it in the same process / a daemon process.
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin <th...@gmail.com>
> >>
> >>
> >> --
> >> ChiaHung Lin
> >> Department of Information Management
> >> National University of Kaohsiung
> >> Taiwan
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



--
Thomas Jungblut
Berlin

Re: Checkpointer Process

Posted by "Edward J. Yoon" <ed...@apache.org>.
P.S., I know this task is not easy. Should we re-scheduling this to 0.5 release?

On Tue, Oct 25, 2011 at 5:22 PM, Edward J. Yoon <ed...@apache.org> wrote:
>> 1.) Checkpointer runs on the same process with bsp task.
>> 2.) A separated checkpointing process per bsp task on each machine.
>> 3.) A separated checkopinting process per machine.
>> 4.) Checkpointing processes in forms of server farm.
>
> When some task fails, the whole tasks will be re-started with previous
> checkpoint data. Right?
>
> I'm +1 for the first idea. I believe this way is simple and reliable.
>
> 2011/10/25 ChiaHung Lin <ch...@nuk.edu.tw>:
>> Just some thoughts on why to programme checkpointer as separated process. The idea is centered on isolation. Because fault will occur, ensuring that failures/ errors would not adversely affect other parts of the system becomes critical. Also, performing user tasks and saving data to hdfs are two different issues so our goal is to ensure user tasks would continuously work even if checkpointing process fails. As long as user tasks keep continuously performing their job smoothly, checkpointing process can be ignored.
>>
>> There were 4 options considered previously:
>>
>> 1.) Checkpointer runs on the same process with bsp task.
>> 2.) A separated checkpointing process per bsp task on each machine.
>> 3.) A separated checkopinting process per machine.
>> 4.) Checkpointing processes in forms of server farm.
>>
>> The problem for the first one is if the checkpoining process fails, user tasks may fail as well, which is an unwanted behaviour for users. The fourth has a problem that it affects arbitrary user tasks for recovery if both processes fail. The second and third is similar except that the second option would min user tasks to be affected if both processes fail. Running checkpointer as separated process has an advantage that if only checkpointing process fails, it is not necessary to recover. For example, suppose a BSP job performs its tasks from supersteps 1 to 10. At the same time a separated checkpointing process stands by. In the first 3 supersteps, both processes work well. After the supersteps 4, the checkpointing process fails, but the user task is continuously doing it task. At the supersteps 7, the checkpointer is back (e.g. restart). And if user task keeps working until it finishes, there is no need to perform recovery in this case. If bsp task fails after checkpointing process is back, the system has chances to recover from the latest snapshot.
>>
>> I understand the current implementation is not perfect. But that would be good if we can work toward this direction because these are recommended to the best of my knowledge.
>>
>> -----Original message-----
>> From:Thomas Jungblut <th...@googlemail.com>
>> To:hama-dev@incubator.apache.org
>> Date:Fri, 14 Oct 2011 15:54:10 +0200
>> Subject:Checkpointer Process
>>
>> Hi all.
>> My idea:
>> Since YARN and multitasking we should consider moving the Checkpointer
>> process into the BSPPeer itself instead of a single process.
>>
>> It would be great if we could discuss what would be the real advantage and
>> disadvantage of integrating it in the same process / a daemon process.
>>
>> --
>> Thomas Jungblut
>> Berlin <th...@gmail.com>
>>
>>
>> --
>> ChiaHung Lin
>> Department of Information Management
>> National University of Kaohsiung
>> Taiwan
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Checkpointer Process

Posted by "Edward J. Yoon" <ed...@apache.org>.
> 1.) Checkpointer runs on the same process with bsp task.
> 2.) A separated checkpointing process per bsp task on each machine.
> 3.) A separated checkopinting process per machine.
> 4.) Checkpointing processes in forms of server farm.

When some task fails, the whole tasks will be re-started with previous
checkpoint data. Right?

I'm +1 for the first idea. I believe this way is simple and reliable.

2011/10/25 ChiaHung Lin <ch...@nuk.edu.tw>:
> Just some thoughts on why to programme checkpointer as separated process. The idea is centered on isolation. Because fault will occur, ensuring that failures/ errors would not adversely affect other parts of the system becomes critical. Also, performing user tasks and saving data to hdfs are two different issues so our goal is to ensure user tasks would continuously work even if checkpointing process fails. As long as user tasks keep continuously performing their job smoothly, checkpointing process can be ignored.
>
> There were 4 options considered previously:
>
> 1.) Checkpointer runs on the same process with bsp task.
> 2.) A separated checkpointing process per bsp task on each machine.
> 3.) A separated checkopinting process per machine.
> 4.) Checkpointing processes in forms of server farm.
>
> The problem for the first one is if the checkpoining process fails, user tasks may fail as well, which is an unwanted behaviour for users. The fourth has a problem that it affects arbitrary user tasks for recovery if both processes fail. The second and third is similar except that the second option would min user tasks to be affected if both processes fail. Running checkpointer as separated process has an advantage that if only checkpointing process fails, it is not necessary to recover. For example, suppose a BSP job performs its tasks from supersteps 1 to 10. At the same time a separated checkpointing process stands by. In the first 3 supersteps, both processes work well. After the supersteps 4, the checkpointing process fails, but the user task is continuously doing it task. At the supersteps 7, the checkpointer is back (e.g. restart). And if user task keeps working until it finishes, there is no need to perform recovery in this case. If bsp task fails after checkpointing process is back, the system has chances to recover from the latest snapshot.
>
> I understand the current implementation is not perfect. But that would be good if we can work toward this direction because these are recommended to the best of my knowledge.
>
> -----Original message-----
> From:Thomas Jungblut <th...@googlemail.com>
> To:hama-dev@incubator.apache.org
> Date:Fri, 14 Oct 2011 15:54:10 +0200
> Subject:Checkpointer Process
>
> Hi all.
> My idea:
> Since YARN and multitasking we should consider moving the Checkpointer
> process into the BSPPeer itself instead of a single process.
>
> It would be great if we could discuss what would be the real advantage and
> disadvantage of integrating it in the same process / a daemon process.
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Checkpointer Process

Posted by ChiaHung Lin <ch...@nuk.edu.tw>.
Just some thoughts on why to programme checkpointer as separated process. The idea is centered on isolation. Because fault will occur, ensuring that failures/ errors would not adversely affect other parts of the system becomes critical. Also, performing user tasks and saving data to hdfs are two different issues so our goal is to ensure user tasks would continuously work even if checkpointing process fails. As long as user tasks keep continuously performing their job smoothly, checkpointing process can be ignored. 

There were 4 options considered previously:

1.) Checkpointer runs on the same process with bsp task.
2.) A separated checkpointing process per bsp task on each machine. 
3.) A separated checkopinting process per machine.
4.) Checkpointing processes in forms of server farm. 

The problem for the first one is if the checkpoining process fails, user tasks may fail as well, which is an unwanted behaviour for users. The fourth has a problem that it affects arbitrary user tasks for recovery if both processes fail. The second and third is similar except that the second option would min user tasks to be affected if both processes fail. Running checkpointer as separated process has an advantage that if only checkpointing process fails, it is not necessary to recover. For example, suppose a BSP job performs its tasks from supersteps 1 to 10. At the same time a separated checkpointing process stands by. In the first 3 supersteps, both processes work well. After the supersteps 4, the checkpointing process fails, but the user task is continuously doing it task. At the supersteps 7, the checkpointer is back (e.g. restart). And if user task keeps working until it finishes, there is no need to perform recovery in this case. If bsp task fails after checkpointing process is back, the system has chances to recover from the latest snapshot. 

I understand the current implementation is not perfect. But that would be good if we can work toward this direction because these are recommended to the best of my knowledge. 

-----Original message-----
From:Thomas Jungblut <th...@googlemail.com>
To:hama-dev@incubator.apache.org
Date:Fri, 14 Oct 2011 15:54:10 +0200
Subject:Checkpointer Process

Hi all.
My idea:
Since YARN and multitasking we should consider moving the Checkpointer
process into the BSPPeer itself instead of a single process.

It would be great if we could discuss what would be the real advantage and
disadvantage of integrating it in the same process / a daemon process.

-- 
Thomas Jungblut
Berlin <th...@gmail.com>


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan