You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Dru Jensen <dr...@gmail.com> on 2008/09/19 19:38:57 UTC
MR process in endless loop
I have a MR process that gets stuck in an endless loop. It looks like
the same set of keys are being sent to one of the tasks in an endless
loop.
Unfortunately, Its not consistent. Sometimes it works fine. Only 1
of the 6 MR processes gets in this state and never completes.
After the disk space is used up on the HFS, the tables become corrupt
and I can no longer recover them.
The main difference from other MR processes I have is that I added a
filter to the MR process table scanner by extending the
TableInputFormatBase:
public class TableInputFormatColumnFilter extends TableInputFormatBase
implements JobConfigurable {...}
and then adding a ColumnValueFilter in the configure() as follows:
ColumnValueFilter rowFilter = new
ColumnValueFilter(Bytes.toBytes("column_name"),
ColumnValueFilter.CompareOp.LESS_OR_EQUAL,
Bytes.toBytes("column_value"));
setRowFilter(rowFilter);
Any ideas what may be causing this?
Re: Duplicate rows being processed when one MR task completes. Endless Loop in MR task.
Posted by Dru Jensen <dr...@gmail.com>.
St.Ack,
Yes. That is correct. The speculative task also got stuck in the
endless loop. I removed the filter and have not encountered the
endless loop since.
I am in the process of upgrading to hadoop 0.18.1 and hbase 0.18.0. I
will let you know If I can duplicate this and capture debug info.
thanks,
Dru
On Sep 22, 2008, at 9:24 PM, stack wrote:
> 6 tasks so you have 6 regions in your table Dru?
>
> You might enable DEBUG to the map to see that the map or the
> filtering to ensure its not stuck processing same key somehow over
> and over (then a speculative task starts up because the stuck task
> is taking too long to finish, and so on...)
>
> St.Ack
>
>
> Dru Jensen wrote:
>> More information:
>>
>> When I first launch the Job, 6 MR "tasks" are created on 3
>> different servers in the cluster. Each "task" has 1 "task attempt"
>> started.
>> Hadoop map task list for job_200809191015_0010 on machine1
>>
>> All Tasks
>> Task Complete Status Start Time Finish Time
>> Errors Counters
>> tip_200809191015_0010_m_000001 100.00%
>>
>> 19-Sep-2008 10:46:51
>> 19-Sep-2008 11:08:56 (22mins, 4sec)
>>
>> 8
>> tip_200809191015_0010_m_000002 100.00%
>>
>> 19-Sep-2008 10:46:52
>> 19-Sep-2008 11:03:41 (16mins, 48sec)
>>
>> 8
>> tip_200809191015_0010_m_000003 100.00%
>>
>> 19-Sep-2008 10:46:53
>> 19-Sep-2008 11:02:15 (15mins, 22sec)
>>
>> 8
>> tip_200809191015_0010_m_000004 100.00%
>>
>> 19-Sep-2008 10:46:53
>> 19-Sep-2008 10:56:14 (9mins, 20sec)
>>
>> 8
>> tip_200809191015_0010_m_000000 0.00%
>>
>> 19-Sep-2008 10:46:51
>>
>>
>> 0
>> tip_200809191015_0010_m_000005 0.00%
>>
>> 19-Sep-2008 10:46:55
>>
>>
>> 0
>>
>> Go back to JobTracker
>> Hadoop, 2008.
>>
>>
>> When 2 of the"tasks" complete, it looks like one of the still
>> running "tasks" gets a new "task attempt" started.
>> Unfortunately the new "task attempt" is handed the same keys as the
>> first "task attempt" so they are processing the same keys twice.
>>
>> Job job_200809191015_0010
>>
>> All Task Attempts
>> Task Attempts Machine Status Progress Start Time
>> Finish Time Errors Task Logs Counters Actions
>> task_200809191015_0010_m_000000_0 machine2 RUNNING 0.00%
>> 19-Sep-2008 10:46:51
>> Last 4KB
>> Last 8KB
>> All
>> 0
>> task_200809191015_0010_m_000000_1 machine1 RUNNING 0.00%
>> 19-Sep-2008 11:02:15
>> Last 4KB
>> Last 8KB
>> All
>> 0
>>
>> Go back to the job
>> Go back to JobTracker
>> Hadoop, 2008.
>>
>> This is also the scenario that is causing the endless loop. Both
>> "task attempts" not only process the same keys, they start
>> processing the same keys over and over in an endless loop.
>>
>>
>> On Sep 19, 2008, at 10:52 AM, Dru Jensen wrote:
>>
>>> Sorry. Hadoop 0.17.2.1 - Hbase 0.2.1
>>>
>>> On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
>>>
>>>> Dru,
>>>>
>>>> Which versions?
>>>>
>>>> Thx
>>>>
>>>> J-D
>>>>
>>>> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a MR process that gets stuck in an endless loop. It
>>>>> looks like the
>>>>> same set of keys are being sent to one of the tasks in an
>>>>> endless loop.
>>>>> Unfortunately, Its not consistent. Sometimes it works fine.
>>>>> Only 1 of the
>>>>> 6 MR processes gets in this state and never completes.
>>>>> After the disk space is used up on the HFS, the tables become
>>>>> corrupt and I
>>>>> can no longer recover them.
>>>>>
>>>>> The main difference from other MR processes I have is that I
>>>>> added a filter
>>>>> to the MR process table scanner by extending the
>>>>> TableInputFormatBase:
>>>>>
>>>>> public class TableInputFormatColumnFilter extends
>>>>> TableInputFormatBase
>>>>> implements JobConfigurable {...}
>>>>>
>>>>> and then adding a ColumnValueFilter in the configure() as follows:
>>>>>
>>>>> ColumnValueFilter rowFilter = new
>>>>> ColumnValueFilter(Bytes.toBytes("column_name"),
>>>>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,
>>>>> Bytes.toBytes("column_value"));
>>>>> setRowFilter(rowFilter);
>>>>>
>>>>> Any ideas what may be causing this?
>>>
>>
>>
>
Re: Duplicate rows being processed when one MR task completes. Endless
Loop in MR task.
Posted by stack <st...@duboce.net>.
6 tasks so you have 6 regions in your table Dru?
You might enable DEBUG to the map to see that the map or the filtering
to ensure its not stuck processing same key somehow over and over (then
a speculative task starts up because the stuck task is taking too long
to finish, and so on...)
St.Ack
Dru Jensen wrote:
> More information:
>
> When I first launch the Job, 6 MR "tasks" are created on 3 different
> servers in the cluster. Each "task" has 1 "task attempt" started.
> Hadoop map task list for job_200809191015_0010 on machine1
>
> All Tasks
> Task Complete Status Start Time Finish Time Errors
> Counters
> tip_200809191015_0010_m_000001 100.00%
>
> 19-Sep-2008 10:46:51
> 19-Sep-2008 11:08:56 (22mins, 4sec)
>
> 8
> tip_200809191015_0010_m_000002 100.00%
>
> 19-Sep-2008 10:46:52
> 19-Sep-2008 11:03:41 (16mins, 48sec)
>
> 8
> tip_200809191015_0010_m_000003 100.00%
>
> 19-Sep-2008 10:46:53
> 19-Sep-2008 11:02:15 (15mins, 22sec)
>
> 8
> tip_200809191015_0010_m_000004 100.00%
>
> 19-Sep-2008 10:46:53
> 19-Sep-2008 10:56:14 (9mins, 20sec)
>
> 8
> tip_200809191015_0010_m_000000 0.00%
>
> 19-Sep-2008 10:46:51
>
>
> 0
> tip_200809191015_0010_m_000005 0.00%
>
> 19-Sep-2008 10:46:55
>
>
> 0
>
> Go back to JobTracker
> Hadoop, 2008.
>
>
> When 2 of the"tasks" complete, it looks like one of the still running
> "tasks" gets a new "task attempt" started.
> Unfortunately the new "task attempt" is handed the same keys as the
> first "task attempt" so they are processing the same keys twice.
>
> Job job_200809191015_0010
>
> All Task Attempts
> Task Attempts Machine Status Progress Start Time Finish
> Time Errors Task Logs Counters Actions
> task_200809191015_0010_m_000000_0 machine2 RUNNING 0.00%
> 19-Sep-2008 10:46:51
>
> Last 4KB
> Last 8KB
> All
> 0
>
> task_200809191015_0010_m_000000_1 machine1 RUNNING 0.00%
> 19-Sep-2008 11:02:15
>
> Last 4KB
> Last 8KB
> All
> 0
>
>
> Go back to the job
> Go back to JobTracker
> Hadoop, 2008.
>
> This is also the scenario that is causing the endless loop. Both
> "task attempts" not only process the same keys, they start processing
> the same keys over and over in an endless loop.
>
>
> On Sep 19, 2008, at 10:52 AM, Dru Jensen wrote:
>
>> Sorry. Hadoop 0.17.2.1 - Hbase 0.2.1
>>
>> On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
>>
>>> Dru,
>>>
>>> Which versions?
>>>
>>> Thx
>>>
>>> J-D
>>>
>>> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>
>>> wrote:
>>>
>>>> I have a MR process that gets stuck in an endless loop. It looks
>>>> like the
>>>> same set of keys are being sent to one of the tasks in an endless
>>>> loop.
>>>> Unfortunately, Its not consistent. Sometimes it works fine. Only
>>>> 1 of the
>>>> 6 MR processes gets in this state and never completes.
>>>> After the disk space is used up on the HFS, the tables become
>>>> corrupt and I
>>>> can no longer recover them.
>>>>
>>>> The main difference from other MR processes I have is that I added
>>>> a filter
>>>> to the MR process table scanner by extending the TableInputFormatBase:
>>>>
>>>> public class TableInputFormatColumnFilter extends TableInputFormatBase
>>>> implements JobConfigurable {...}
>>>>
>>>> and then adding a ColumnValueFilter in the configure() as follows:
>>>>
>>>> ColumnValueFilter rowFilter = new
>>>> ColumnValueFilter(Bytes.toBytes("column_name"),
>>>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,
>>>> Bytes.toBytes("column_value"));
>>>> setRowFilter(rowFilter);
>>>>
>>>> Any ideas what may be causing this?
>>
>
>
Duplicate rows being processed when one MR task completes. Endless Loop in MR task.
Posted by Dru Jensen <dr...@gmail.com>.
More information:
When I first launch the Job, 6 MR "tasks" are created on 3 different
servers in the cluster. Each "task" has 1 "task attempt" started.
Hadoop map task list for job_200809191015_0010 on machine1
All Tasks
Task Complete Status Start Time Finish Time Errors Counters
tip_200809191015_0010_m_000001 100.00%
19-Sep-2008 10:46:51
19-Sep-2008 11:08:56 (22mins, 4sec)
8
tip_200809191015_0010_m_000002 100.00%
19-Sep-2008 10:46:52
19-Sep-2008 11:03:41 (16mins, 48sec)
8
tip_200809191015_0010_m_000003 100.00%
19-Sep-2008 10:46:53
19-Sep-2008 11:02:15 (15mins, 22sec)
8
tip_200809191015_0010_m_000004 100.00%
19-Sep-2008 10:46:53
19-Sep-2008 10:56:14 (9mins, 20sec)
8
tip_200809191015_0010_m_000000 0.00%
19-Sep-2008 10:46:51
0
tip_200809191015_0010_m_000005 0.00%
19-Sep-2008 10:46:55
0
Go back to JobTracker
Hadoop, 2008.
When 2 of the"tasks" complete, it looks like one of the still running
"tasks" gets a new "task attempt" started.
Unfortunately the new "task attempt" is handed the same keys as the
first "task attempt" so they are processing the same keys twice.
Job job_200809191015_0010
All Task Attempts
Task Attempts Machine Status Progress Start Time Finish Time Errors
Task Logs Counters Actions
task_200809191015_0010_m_000000_0 machine2 RUNNING 0.00%
19-Sep-2008 10:46:51
Last 4KB
Last 8KB
All
0
task_200809191015_0010_m_000000_1 machine1 RUNNING 0.00%
19-Sep-2008 11:02:15
Last 4KB
Last 8KB
All
0
Go back to the job
Go back to JobTracker
Hadoop, 2008.
This is also the scenario that is causing the endless loop. Both
"task attempts" not only process the same keys, they start processing
the same keys over and over in an endless loop.
On Sep 19, 2008, at 10:52 AM, Dru Jensen wrote:
> Sorry. Hadoop 0.17.2.1 - Hbase 0.2.1
>
> On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
>
>> Dru,
>>
>> Which versions?
>>
>> Thx
>>
>> J-D
>>
>> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>
>> wrote:
>>
>>> I have a MR process that gets stuck in an endless loop. It looks
>>> like the
>>> same set of keys are being sent to one of the tasks in an endless
>>> loop.
>>> Unfortunately, Its not consistent. Sometimes it works fine. Only
>>> 1 of the
>>> 6 MR processes gets in this state and never completes.
>>> After the disk space is used up on the HFS, the tables become
>>> corrupt and I
>>> can no longer recover them.
>>>
>>> The main difference from other MR processes I have is that I added
>>> a filter
>>> to the MR process table scanner by extending the
>>> TableInputFormatBase:
>>>
>>> public class TableInputFormatColumnFilter extends
>>> TableInputFormatBase
>>> implements JobConfigurable {...}
>>>
>>> and then adding a ColumnValueFilter in the configure() as follows:
>>>
>>> ColumnValueFilter rowFilter = new
>>> ColumnValueFilter(Bytes.toBytes("column_name"),
>>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,
>>> Bytes.toBytes("column_value"));
>>> setRowFilter(rowFilter);
>>>
>>> Any ideas what may be causing this?
>
Re: MR process in endless loop
Posted by Dru Jensen <dr...@gmail.com>.
Sorry. Hadoop 0.17.2.1 - Hbase 0.2.1
On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
> Dru,
>
> Which versions?
>
> Thx
>
> J-D
>
> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>
> wrote:
>
>> I have a MR process that gets stuck in an endless loop. It looks
>> like the
>> same set of keys are being sent to one of the tasks in an endless
>> loop.
>> Unfortunately, Its not consistent. Sometimes it works fine. Only
>> 1 of the
>> 6 MR processes gets in this state and never completes.
>> After the disk space is used up on the HFS, the tables become
>> corrupt and I
>> can no longer recover them.
>>
>> The main difference from other MR processes I have is that I added
>> a filter
>> to the MR process table scanner by extending the
>> TableInputFormatBase:
>>
>> public class TableInputFormatColumnFilter extends
>> TableInputFormatBase
>> implements JobConfigurable {...}
>>
>> and then adding a ColumnValueFilter in the configure() as follows:
>>
>> ColumnValueFilter rowFilter = new
>> ColumnValueFilter(Bytes.toBytes("column_name"),
>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,
>> Bytes.toBytes("column_value"));
>> setRowFilter(rowFilter);
>>
>> Any ideas what may be causing this?
Re: MR process in endless loop
Posted by Jean-Daniel Cryans <jd...@apache.org>.
Dru,
Which versions?
Thx
J-D
On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com> wrote:
> I have a MR process that gets stuck in an endless loop. It looks like the
> same set of keys are being sent to one of the tasks in an endless loop.
> Unfortunately, Its not consistent. Sometimes it works fine. Only 1 of the
> 6 MR processes gets in this state and never completes.
> After the disk space is used up on the HFS, the tables become corrupt and I
> can no longer recover them.
>
> The main difference from other MR processes I have is that I added a filter
> to the MR process table scanner by extending the TableInputFormatBase:
>
> public class TableInputFormatColumnFilter extends TableInputFormatBase
> implements JobConfigurable {...}
>
> and then adding a ColumnValueFilter in the configure() as follows:
>
> ColumnValueFilter rowFilter = new
> ColumnValueFilter(Bytes.toBytes("column_name"),
> ColumnValueFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes("column_value"));
> setRowFilter(rowFilter);
>
> Any ideas what may be causing this?