You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dru Jensen <dr...@gmail.com> on 2008/09/19 19:38:57 UTC

MR process in endless loop

I have a MR process that gets stuck in an endless loop.  It looks like  
the same set of keys are being sent to one of the tasks in an endless  
loop.
Unfortunately, Its not consistent.  Sometimes it works fine.  Only 1  
of the 6 MR processes gets in this state and never completes.
After the disk space is used up on the HFS, the tables become corrupt  
and I can no longer recover them.

The main difference from other MR processes I have is that I added a  
filter to the MR process table scanner by extending the  
TableInputFormatBase:

public class TableInputFormatColumnFilter extends TableInputFormatBase  
implements JobConfigurable {...}

and then adding a ColumnValueFilter in the configure() as follows:

     ColumnValueFilter rowFilter = new  
ColumnValueFilter(Bytes.toBytes("column_name"),  
ColumnValueFilter.CompareOp.LESS_OR_EQUAL,  
Bytes.toBytes("column_value"));
     setRowFilter(rowFilter);

Any ideas what may be causing this?

Re: Duplicate rows being processed when one MR task completes. Endless Loop in MR task.

Posted by Dru Jensen <dr...@gmail.com>.

St.Ack,

Yes.  That is correct. The speculative task also got stuck in the  
endless loop.  I removed the filter and have not encountered the  
endless loop since.

I am in the process of upgrading to hadoop 0.18.1 and hbase 0.18.0.  I  
will let you know If I can duplicate this and capture debug info.

thanks,
Dru

On Sep 22, 2008, at 9:24 PM, stack wrote:

> 6 tasks so you have 6 regions in your table Dru?
>
> You might enable DEBUG to the map to see that the map or the  
> filtering to ensure its not stuck processing same key somehow over  
> and over (then a speculative task starts up because the stuck task  
> is taking too long to finish, and so on...)
>
> St.Ack
>
>
> Dru Jensen wrote:
>> More information:
>>
>> When I first launch the Job, 6 MR "tasks" are created on 3  
>> different servers in the cluster. Each "task" has 1 "task attempt"  
>> started.
>> Hadoop map task list for job_200809191015_0010 on machine1
>>
>> All Tasks
>> Task    Complete    Status    Start Time    Finish Time     
>> Errors    Counters
>> tip_200809191015_0010_m_000001    100.00%
>>
>> 19-Sep-2008 10:46:51
>> 19-Sep-2008 11:08:56 (22mins, 4sec)
>>
>> 8
>> tip_200809191015_0010_m_000002    100.00%
>>
>> 19-Sep-2008 10:46:52
>> 19-Sep-2008 11:03:41 (16mins, 48sec)
>>
>> 8
>> tip_200809191015_0010_m_000003    100.00%
>>
>> 19-Sep-2008 10:46:53
>> 19-Sep-2008 11:02:15 (15mins, 22sec)
>>
>> 8
>> tip_200809191015_0010_m_000004    100.00%
>>
>> 19-Sep-2008 10:46:53
>> 19-Sep-2008 10:56:14 (9mins, 20sec)
>>
>> 8
>> tip_200809191015_0010_m_000000    0.00%
>>
>> 19-Sep-2008 10:46:51
>>
>>
>> 0
>> tip_200809191015_0010_m_000005    0.00%
>>
>> 19-Sep-2008 10:46:55
>>
>>
>> 0
>>
>> Go back to JobTracker
>> Hadoop, 2008.
>>
>>
>> When 2 of the"tasks" complete, it looks like one of the still  
>> running "tasks" gets a new "task attempt" started.
>> Unfortunately the new "task attempt" is handed the same keys as the  
>> first "task attempt" so they are processing the same keys twice.
>>
>> Job job_200809191015_0010
>>
>> All Task Attempts
>> Task Attempts    Machine    Status    Progress    Start Time     
>> Finish Time    Errors    Task Logs    Counters    Actions
>> task_200809191015_0010_m_000000_0    machine2    RUNNING    0.00%
>> 19-Sep-2008 10:46:51
>> Last 4KB
>> Last 8KB
>> All
>> 0
>> task_200809191015_0010_m_000000_1    machine1    RUNNING    0.00%
>> 19-Sep-2008 11:02:15
>> Last 4KB
>> Last 8KB
>> All
>> 0
>>
>> Go back to the job
>> Go back to JobTracker
>> Hadoop, 2008.
>>
>> This is also the scenario that is causing the endless loop.  Both  
>> "task attempts" not only process the same keys, they start  
>> processing the same keys over and over in an endless loop.
>>
>>
>> On Sep 19, 2008, at 10:52 AM, Dru Jensen wrote:
>>
>>> Sorry.  Hadoop 0.17.2.1 - Hbase 0.2.1
>>>
>>> On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
>>>
>>>> Dru,
>>>>
>>>> Which versions?
>>>>
>>>> Thx
>>>>
>>>> J-D
>>>>
>>>> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>  
>>>> wrote:
>>>>
>>>>> I have a MR process that gets stuck in an endless loop.  It  
>>>>> looks like the
>>>>> same set of keys are being sent to one of the tasks in an  
>>>>> endless loop.
>>>>> Unfortunately, Its not consistent.  Sometimes it works fine.   
>>>>> Only 1 of the
>>>>> 6 MR processes gets in this state and never completes.
>>>>> After the disk space is used up on the HFS, the tables become  
>>>>> corrupt and I
>>>>> can no longer recover them.
>>>>>
>>>>> The main difference from other MR processes I have is that I  
>>>>> added a filter
>>>>> to the MR process table scanner by extending the  
>>>>> TableInputFormatBase:
>>>>>
>>>>> public class TableInputFormatColumnFilter extends  
>>>>> TableInputFormatBase
>>>>> implements JobConfigurable {...}
>>>>>
>>>>> and then adding a ColumnValueFilter in the configure() as follows:
>>>>>
>>>>> ColumnValueFilter rowFilter = new
>>>>> ColumnValueFilter(Bytes.toBytes("column_name"),
>>>>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,  
>>>>> Bytes.toBytes("column_value"));
>>>>> setRowFilter(rowFilter);
>>>>>
>>>>> Any ideas what may be causing this?
>>>
>>
>>
>

Re: Duplicate rows being processed when one MR task completes. Endless Loop in MR task.

Posted by stack <st...@duboce.net>.

6 tasks so you have 6 regions in your table Dru?

You might enable DEBUG to the map to see that the map or the filtering 
to ensure its not stuck processing same key somehow over and over (then 
a speculative task starts up because the stuck task is taking too long 
to finish, and so on...)

St.Ack


Dru Jensen wrote:
> More information:
>
> When I first launch the Job, 6 MR "tasks" are created on 3 different 
> servers in the cluster. Each "task" has 1 "task attempt" started.
> Hadoop map task list for job_200809191015_0010 on machine1
>
> All Tasks
> Task    Complete    Status    Start Time    Finish Time    Errors    
> Counters
> tip_200809191015_0010_m_000001    100.00%
>
> 19-Sep-2008 10:46:51
> 19-Sep-2008 11:08:56 (22mins, 4sec)
>
> 8
> tip_200809191015_0010_m_000002    100.00%
>
> 19-Sep-2008 10:46:52
> 19-Sep-2008 11:03:41 (16mins, 48sec)
>
> 8
> tip_200809191015_0010_m_000003    100.00%
>
> 19-Sep-2008 10:46:53
> 19-Sep-2008 11:02:15 (15mins, 22sec)
>
> 8
> tip_200809191015_0010_m_000004    100.00%
>
> 19-Sep-2008 10:46:53
> 19-Sep-2008 10:56:14 (9mins, 20sec)
>
> 8
> tip_200809191015_0010_m_000000    0.00%
>
> 19-Sep-2008 10:46:51
>
>
> 0
> tip_200809191015_0010_m_000005    0.00%
>
> 19-Sep-2008 10:46:55
>
>
> 0
>
> Go back to JobTracker
> Hadoop, 2008.
>
>
> When 2 of the"tasks" complete, it looks like one of the still running 
> "tasks" gets a new "task attempt" started.
> Unfortunately the new "task attempt" is handed the same keys as the 
> first "task attempt" so they are processing the same keys twice.
>
> Job job_200809191015_0010
>
> All Task Attempts
> Task Attempts    Machine    Status    Progress    Start Time    Finish 
> Time    Errors    Task Logs    Counters    Actions
> task_200809191015_0010_m_000000_0    machine2    RUNNING    0.00%
> 19-Sep-2008 10:46:51       
>
> Last 4KB
> Last 8KB
> All
> 0   
>
> task_200809191015_0010_m_000000_1    machine1    RUNNING    0.00%
> 19-Sep-2008 11:02:15       
>
> Last 4KB
> Last 8KB
> All
> 0   
>
>
> Go back to the job
> Go back to JobTracker
> Hadoop, 2008.
>
> This is also the scenario that is causing the endless loop.  Both 
> "task attempts" not only process the same keys, they start processing 
> the same keys over and over in an endless loop.
>
>
> On Sep 19, 2008, at 10:52 AM, Dru Jensen wrote:
>
>> Sorry.  Hadoop 0.17.2.1 - Hbase 0.2.1
>>
>> On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
>>
>>> Dru,
>>>
>>> Which versions?
>>>
>>> Thx
>>>
>>> J-D
>>>
>>> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com> 
>>> wrote:
>>>
>>>> I have a MR process that gets stuck in an endless loop.  It looks 
>>>> like the
>>>> same set of keys are being sent to one of the tasks in an endless 
>>>> loop.
>>>> Unfortunately, Its not consistent.  Sometimes it works fine.  Only 
>>>> 1 of the
>>>> 6 MR processes gets in this state and never completes.
>>>> After the disk space is used up on the HFS, the tables become 
>>>> corrupt and I
>>>> can no longer recover them.
>>>>
>>>> The main difference from other MR processes I have is that I added 
>>>> a filter
>>>> to the MR process table scanner by extending the TableInputFormatBase:
>>>>
>>>> public class TableInputFormatColumnFilter extends TableInputFormatBase
>>>> implements JobConfigurable {...}
>>>>
>>>> and then adding a ColumnValueFilter in the configure() as follows:
>>>>
>>>>  ColumnValueFilter rowFilter = new
>>>> ColumnValueFilter(Bytes.toBytes("column_name"),
>>>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL, 
>>>> Bytes.toBytes("column_value"));
>>>>  setRowFilter(rowFilter);
>>>>
>>>> Any ideas what may be causing this?
>>
>
>

Duplicate rows being processed when one MR task completes. Endless Loop in MR task.

Posted by Dru Jensen <dr...@gmail.com>.

More information:

When I first launch the Job, 6 MR "tasks" are created on 3 different  
servers in the cluster. Each "task" has 1 "task attempt" started.
Hadoop map task list for job_200809191015_0010 on machine1

All Tasks
Task	Complete	Status	Start Time	Finish Time	Errors	Counters
tip_200809191015_0010_m_000001	100.00%

19-Sep-2008 10:46:51
19-Sep-2008 11:08:56 (22mins, 4sec)

8
tip_200809191015_0010_m_000002	100.00%

19-Sep-2008 10:46:52
19-Sep-2008 11:03:41 (16mins, 48sec)

8
tip_200809191015_0010_m_000003	100.00%

19-Sep-2008 10:46:53
19-Sep-2008 11:02:15 (15mins, 22sec)

8
tip_200809191015_0010_m_000004	100.00%

19-Sep-2008 10:46:53
19-Sep-2008 10:56:14 (9mins, 20sec)

8
tip_200809191015_0010_m_000000	0.00%

19-Sep-2008 10:46:51


0
tip_200809191015_0010_m_000005	0.00%

19-Sep-2008 10:46:55


0

Go back to JobTracker
Hadoop, 2008.


When 2 of the"tasks" complete, it looks like one of the still running  
"tasks" gets a new "task attempt" started.
Unfortunately the new "task attempt" is handed the same keys as the  
first "task attempt" so they are processing the same keys twice.

Job job_200809191015_0010

All Task Attempts
Task Attempts	Machine	Status	Progress	Start Time	Finish Time	Errors	 
Task Logs	Counters	Actions
task_200809191015_0010_m_000000_0	machine2	RUNNING	0.00%
19-Sep-2008 10:46:51		

Last 4KB
Last 8KB
All
0	

task_200809191015_0010_m_000000_1	machine1	RUNNING	0.00%
19-Sep-2008 11:02:15		

Last 4KB
Last 8KB
All
0	


Go back to the job
Go back to JobTracker
Hadoop, 2008.

This is also the scenario that is causing the endless loop.  Both  
"task attempts" not only process the same keys, they start processing  
the same keys over and over in an endless loop.


On Sep 19, 2008, at 10:52 AM, Dru Jensen wrote:

> Sorry.  Hadoop 0.17.2.1 - Hbase 0.2.1
>
> On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:
>
>> Dru,
>>
>> Which versions?
>>
>> Thx
>>
>> J-D
>>
>> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>  
>> wrote:
>>
>>> I have a MR process that gets stuck in an endless loop.  It looks  
>>> like the
>>> same set of keys are being sent to one of the tasks in an endless  
>>> loop.
>>> Unfortunately, Its not consistent.  Sometimes it works fine.  Only  
>>> 1 of the
>>> 6 MR processes gets in this state and never completes.
>>> After the disk space is used up on the HFS, the tables become  
>>> corrupt and I
>>> can no longer recover them.
>>>
>>> The main difference from other MR processes I have is that I added  
>>> a filter
>>> to the MR process table scanner by extending the  
>>> TableInputFormatBase:
>>>
>>> public class TableInputFormatColumnFilter extends  
>>> TableInputFormatBase
>>> implements JobConfigurable {...}
>>>
>>> and then adding a ColumnValueFilter in the configure() as follows:
>>>
>>>  ColumnValueFilter rowFilter = new
>>> ColumnValueFilter(Bytes.toBytes("column_name"),
>>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,  
>>> Bytes.toBytes("column_value"));
>>>  setRowFilter(rowFilter);
>>>
>>> Any ideas what may be causing this?
>

Re: MR process in endless loop

Posted by Dru Jensen <dr...@gmail.com>.

Sorry.  Hadoop 0.17.2.1 - Hbase 0.2.1

On Sep 19, 2008, at 10:40 AM, Jean-Daniel Cryans wrote:

> Dru,
>
> Which versions?
>
> Thx
>
> J-D
>
> On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com>  
> wrote:
>
>> I have a MR process that gets stuck in an endless loop.  It looks  
>> like the
>> same set of keys are being sent to one of the tasks in an endless  
>> loop.
>> Unfortunately, Its not consistent.  Sometimes it works fine.  Only  
>> 1 of the
>> 6 MR processes gets in this state and never completes.
>> After the disk space is used up on the HFS, the tables become  
>> corrupt and I
>> can no longer recover them.
>>
>> The main difference from other MR processes I have is that I added  
>> a filter
>> to the MR process table scanner by extending the  
>> TableInputFormatBase:
>>
>> public class TableInputFormatColumnFilter extends  
>> TableInputFormatBase
>> implements JobConfigurable {...}
>>
>> and then adding a ColumnValueFilter in the configure() as follows:
>>
>>   ColumnValueFilter rowFilter = new
>> ColumnValueFilter(Bytes.toBytes("column_name"),
>> ColumnValueFilter.CompareOp.LESS_OR_EQUAL,  
>> Bytes.toBytes("column_value"));
>>   setRowFilter(rowFilter);
>>
>> Any ideas what may be causing this?

Re: MR process in endless loop

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Dru,

Which versions?

Thx

J-D

On Fri, Sep 19, 2008 at 1:38 PM, Dru Jensen <dr...@gmail.com> wrote:

> I have a MR process that gets stuck in an endless loop.  It looks like the
> same set of keys are being sent to one of the tasks in an endless loop.
> Unfortunately, Its not consistent.  Sometimes it works fine.  Only 1 of the
> 6 MR processes gets in this state and never completes.
> After the disk space is used up on the HFS, the tables become corrupt and I
> can no longer recover them.
>
> The main difference from other MR processes I have is that I added a filter
> to the MR process table scanner by extending the TableInputFormatBase:
>
> public class TableInputFormatColumnFilter extends TableInputFormatBase
> implements JobConfigurable {...}
>
> and then adding a ColumnValueFilter in the configure() as follows:
>
>    ColumnValueFilter rowFilter = new
> ColumnValueFilter(Bytes.toBytes("column_name"),
> ColumnValueFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes("column_value"));
>    setRowFilter(rowFilter);
>
> Any ideas what may be causing this?