You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/12/16 13:52:13 UTC

MR missing lines

Hi,

I have a table where I'm running MR each time is exceding 100 000 rows.

When the target is reached, all the feeding process are stopped.

Yesterday it reached 123608 rows. So I stopped the feeding process,
and ran the MR.

For each line, the MR is creating a delete. The delete is placed on a
list, and when the list reached 10 elements, it's sent to the table.
In the clean method, the list is sent to the table if there is any
element in it.

So at the en of the MR, I should have an empty table.

The table is splitted over 128 regions. And I have 8 region servers.

What is disturbing me is that after the MR, I had 38 lines remaining
on the table. the MR took 348 minutes to run. So I ran the MR again,
which this time took 2 minutes, and now I have 1 row remaining in the
table.

I looked at the logs (for the 38 lines run) and there is nothing in
it. There is some scanner timeout exception for the run of the 100K
rows.

I'm running HBase 0.94.3.

I will hava another 100K rows today, so I will re-run the job. I will
increase the timeout to make sure I got no exception, but even when I
ran the 38 lines with no exception one was remaining...

Any idea why and where I can seach? It's not really an issue for me
since I can just re-run the job, but this might be an issue for some
others.

JM

Re: MR missing lines

Posted by Kevin O'dell <ke...@cloudera.com>.

Probably best to create some client side logging in the MR job to track
this down.
On Dec 16, 2012 7:52 AM, "Jean-Marc Spaggiari" <je...@spaggiari.org>
wrote:

> Hi,
>
> I have a table where I'm running MR each time is exceding 100 000 rows.
>
> When the target is reached, all the feeding process are stopped.
>
> Yesterday it reached 123608 rows. So I stopped the feeding process,
> and ran the MR.
>
> For each line, the MR is creating a delete. The delete is placed on a
> list, and when the list reached 10 elements, it's sent to the table.
> In the clean method, the list is sent to the table if there is any
> element in it.
>
> So at the en of the MR, I should have an empty table.
>
> The table is splitted over 128 regions. And I have 8 region servers.
>
> What is disturbing me is that after the MR, I had 38 lines remaining
> on the table. the MR took 348 minutes to run. So I ran the MR again,
> which this time took 2 minutes, and now I have 1 row remaining in the
> table.
>
> I looked at the logs (for the 38 lines run) and there is nothing in
> it. There is some scanner timeout exception for the run of the 100K
> rows.
>
> I'm running HBase 0.94.3.
>
> I will hava another 100K rows today, so I will re-run the job. I will
> increase the timeout to make sure I got no exception, but even when I
> ran the 38 lines with no exception one was remaining...
>
> Any idea why and where I can seach? It's not really an issue for me
> since I can just re-run the job, but this might be an issue for some
> others.
>
> JM
>

Re: MR missing lines

Posted by Harsh J <ha...@cloudera.com>.

You can use MR counters to count your overall Deletes, to see if they
match your table count. Also, does your job input record count match
the expected count of the table you intended to clear?

On Sun, Dec 16, 2012 at 6:22 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi,
>
> I have a table where I'm running MR each time is exceding 100 000 rows.
>
> When the target is reached, all the feeding process are stopped.
>
> Yesterday it reached 123608 rows. So I stopped the feeding process,
> and ran the MR.
>
> For each line, the MR is creating a delete. The delete is placed on a
> list, and when the list reached 10 elements, it's sent to the table.
> In the clean method, the list is sent to the table if there is any
> element in it.
>
> So at the en of the MR, I should have an empty table.
>
> The table is splitted over 128 regions. And I have 8 region servers.
>
> What is disturbing me is that after the MR, I had 38 lines remaining
> on the table. the MR took 348 minutes to run. So I ran the MR again,
> which this time took 2 minutes, and now I have 1 row remaining in the
> table.
>
> I looked at the logs (for the 38 lines run) and there is nothing in
> it. There is some scanner timeout exception for the run of the 100K
> rows.
>
> I'm running HBase 0.94.3.
>
> I will hava another 100K rows today, so I will re-run the job. I will
> increase the timeout to make sure I got no exception, but even when I
> ran the 38 lines with no exception one was remaining...
>
> Any idea why and where I can seach? It's not really an issue for me
> since I can just re-run the job, but this might be an issue for some
> others.
>
> JM



-- 
Harsh J

RE: MR missing lines

Posted by Anoop Sam John <an...@huawei.com>.

Hi All
           Be careful with selecting the Delete#deleteColumn() Delete#deleteColumns().
deleteColumn() API is to delete just one version of a column in a give row. While the other deletes all the versions data of the column.
In Jean's case which API is used will not matter in a functional way as he is having only one version for a column and even one column in every row.

But deleteColumn will be having an overhead. When this is used and not passing any TS ( latestTimeStamp by default comes in), there will be a get operation happening within the HRegion to get the ts of the most recent version for this column.   deleteColumn (cf,qualifier) API tells to delete the most recent version of the cf:qualifier while deleteColumns(cf,qualifier) tells to delete the whole column from the row (all the versions)

-Anoop-
________________________________________
From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
Sent: Thursday, December 20, 2012 6:09 AM
To: user@hbase.apache.org
Subject: Re: MR missing lines

Hi Anoop,

Thanks for the hint! Even if it's not fixing my issue, at least my
tests are going to be faster.

I will take a look at the documentation to understand what
deleteColumn was doing.

JM

2012/12/19, Anoop Sam John <an...@huawei.com>:
> Jean:  just one thought after seeing the description and the code.. Not
> related to the missing as such
>
> You want to delete the row fully right?
>>My table is only one CF with one C with one version
> And your code is like
>>  Delete delete_entry_proposed = new Delete(key);
>>  delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
>> KVs.get(0).getQualifier());
>
> deleteColumn() is useful when you want to delete specific column's specific
> version in a row.  In your case this may be really not needed. Just Delete
> delete_entry_proposed = new Delete(key);  may be enough so that the delete
> type is ROW delete.
>
> You can see the javadoc of the deleteColumn() API in which it clearly says
> it is an expensive op. At the server side there will be a need to do a Get
> call..
> In your case these are really unwanted over head .. I think...
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Tuesday, December 18, 2012 7:07 PM
> To: user@hbase.apache.org
> Subject: Re: MR missing lines
>
> I faced the issue again today...
>
> RowCounter gave me 104313 lines
> Here is the output of the job counters:
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
> 12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
> 12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311
>
> There is a 2 lines difference between ROWS_PARSED and he counter.
> ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
> ENTRY_EXISTING are the 5 states an entry can have. Total of all those
> counters is equal to the ROWS_PARSED value, so it's alligned. Code is
> handling all the possibilities.
>
> The ROWS_PARSED counter is incremented right at the beginning like
> that (I removed the comments and javadoc for lisibility):
>                 /**
>                  * The comments ...
>                  */
>                 @Override
>                 public void map(ImmutableBytesWritable row__, Result values,
> Context
> context) throws IOException
>                 {
>
>
> context.getCounter(Counters.ROWS_PARSED).increment(1);
>                         List<KeyValue> KVs = values.list();
>                         try
>                         {
>
>                                 // Get the current row.
>                                 byte[] key = values.getRow();
>
>                                 // First thing we do, we mark this line to
> be deleted.
>                                 Delete delete_entry_proposed = new
> Delete(key);
>
> delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
> KVs.get(0).getQualifier());
>
> deletes_entry_proposed.add(delete_entry_proposed);
>
>
> The deletes_entry_proposed is a list of rows to delete. After each
> call to the delete method, the number of remaining lines into this
> list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
> should be deleted correctly.
>
> I re-ran the rowcounter after the job, and I still have ROWS=5971
> lines into the table. I check all my "feeding process" and they are
> all closed.
>
> My table is only one CF with one C with one version.
>
> I can guess that the remaining 5971 lines into the table is an error
> on my side, but I'm not able to find where since all the counters are
> matching. I will add one counter which will add all the entries in the
> delete list before calling the delete method. This should match the
> number of rows.
>
> Again, I will re-feed the table today with fresh data and re-run the job...
>
> JM
>
> 2012/12/17, Jean-Marc Spaggiari <je...@spaggiari.org>:
>> The job run the morning, and of course, this time, all the rows got
>> processed ;)
>>
>> So I will give it few other tries and will keep you posted if I'm able
>> to reproduce that again.
>>
>> Thanks,
>>
>> JM
>>
>> 2012/12/16, Jean-Marc Spaggiari <je...@spaggiari.org>:
>>> Thanks for the suggestions.
>>>
>>> I already have logs to display all the exepctions and there is
>>> nothing. I can't display the work done, there is to much :(
>>>
>>> I have counters "counting" the rows processed and they match what is
>>> done, minus what is not processed. I have just added few other
>>> counters. One right at the beginning, and one to count what are the
>>> records remaining on the delete list, as suggested.
>>>
>>> I will run the job again tomorrow, see the result and keep you posted.
>>>
>>> JM
>>>
>>>
>>> 2012/12/16, Asaf Mesika <as...@gmail.com>:
>>>> Did you check the returned array of the delete method to make sure all
>>>> records sent for delete have been deleted?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 16 ���� 2012, at 14:52, Jean-Marc Spaggiari
>>>> <je...@spaggiari.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a table where I'm running MR each time is exceding 100 000
>>>>> rows.
>>>>>
>>>>> When the target is reached, all the feeding process are stopped.
>>>>>
>>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>>> and ran the MR.
>>>>>
>>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>>> In the clean method, the list is sent to the table if there is any
>>>>> element in it.
>>>>>
>>>>> So at the en of the MR, I should have an empty table.
>>>>>
>>>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>>>
>>>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>>>> table.
>>>>>
>>>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>>>> it. There is some scanner timeout exception for the run of the 100K
>>>>> rows.
>>>>>
>>>>> I'm running HBase 0.94.3.
>>>>>
>>>>> I will hava another 100K rows today, so I will re-run the job. I will
>>>>> increase the timeout to make sure I got no exception, but even when I
>>>>> ran the 38 lines with no exception one was remaining...
>>>>>
>>>>> Any idea why and where I can seach? It's not really an issue for me
>>>>> since I can just re-run the job, but this might be an issue for some
>>>>> others.
>>>>>
>>>>> JM
>>>>
>>>
>>

Re: MR missing lines

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Anoop,

Thanks for the hint! Even if it's not fixing my issue, at least my
tests are going to be faster.

I will take a look at the documentation to understand what
deleteColumn was doing.

JM

2012/12/19, Anoop Sam John <an...@huawei.com>:
> Jean:  just one thought after seeing the description and the code.. Not
> related to the missing as such
>
> You want to delete the row fully right?
>>My table is only one CF with one C with one version
> And your code is like
>>  Delete delete_entry_proposed = new Delete(key);
>>  delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
>> KVs.get(0).getQualifier());
>
> deleteColumn() is useful when you want to delete specific column's specific
> version in a row.  In your case this may be really not needed. Just Delete
> delete_entry_proposed = new Delete(key);  may be enough so that the delete
> type is ROW delete.
>
> You can see the javadoc of the deleteColumn() API in which it clearly says
> it is an expensive op. At the server side there will be a need to do a Get
> call..
> In your case these are really unwanted over head .. I think...
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Tuesday, December 18, 2012 7:07 PM
> To: user@hbase.apache.org
> Subject: Re: MR missing lines
>
> I faced the issue again today...
>
> RowCounter gave me 104313 lines
> Here is the output of the job counters:
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
> 12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
> 12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311
>
> There is a 2 lines difference between ROWS_PARSED and he counter.
> ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
> ENTRY_EXISTING are the 5 states an entry can have. Total of all those
> counters is equal to the ROWS_PARSED value, so it's alligned. Code is
> handling all the possibilities.
>
> The ROWS_PARSED counter is incremented right at the beginning like
> that (I removed the comments and javadoc for lisibility):
>                 /**
>                  * The comments ...
>                  */
>                 @Override
>                 public void map(ImmutableBytesWritable row__, Result values,
> Context
> context) throws IOException
>                 {
>
>
> context.getCounter(Counters.ROWS_PARSED).increment(1);
>                         List<KeyValue> KVs = values.list();
>                         try
>                         {
>
>                                 // Get the current row.
>                                 byte[] key = values.getRow();
>
>                                 // First thing we do, we mark this line to
> be deleted.
>                                 Delete delete_entry_proposed = new
> Delete(key);
>
> delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
> KVs.get(0).getQualifier());
>
> deletes_entry_proposed.add(delete_entry_proposed);
>
>
> The deletes_entry_proposed is a list of rows to delete. After each
> call to the delete method, the number of remaining lines into this
> list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
> should be deleted correctly.
>
> I re-ran the rowcounter after the job, and I still have ROWS=5971
> lines into the table. I check all my "feeding process" and they are
> all closed.
>
> My table is only one CF with one C with one version.
>
> I can guess that the remaining 5971 lines into the table is an error
> on my side, but I'm not able to find where since all the counters are
> matching. I will add one counter which will add all the entries in the
> delete list before calling the delete method. This should match the
> number of rows.
>
> Again, I will re-feed the table today with fresh data and re-run the job...
>
> JM
>
> 2012/12/17, Jean-Marc Spaggiari <je...@spaggiari.org>:
>> The job run the morning, and of course, this time, all the rows got
>> processed ;)
>>
>> So I will give it few other tries and will keep you posted if I'm able
>> to reproduce that again.
>>
>> Thanks,
>>
>> JM
>>
>> 2012/12/16, Jean-Marc Spaggiari <je...@spaggiari.org>:
>>> Thanks for the suggestions.
>>>
>>> I already have logs to display all the exepctions and there is
>>> nothing. I can't display the work done, there is to much :(
>>>
>>> I have counters "counting" the rows processed and they match what is
>>> done, minus what is not processed. I have just added few other
>>> counters. One right at the beginning, and one to count what are the
>>> records remaining on the delete list, as suggested.
>>>
>>> I will run the job again tomorrow, see the result and keep you posted.
>>>
>>> JM
>>>
>>>
>>> 2012/12/16, Asaf Mesika <as...@gmail.com>:
>>>> Did you check the returned array of the delete method to make sure all
>>>> records sent for delete have been deleted?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari
>>>> <je...@spaggiari.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a table where I'm running MR each time is exceding 100 000
>>>>> rows.
>>>>>
>>>>> When the target is reached, all the feeding process are stopped.
>>>>>
>>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>>> and ran the MR.
>>>>>
>>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>>> In the clean method, the list is sent to the table if there is any
>>>>> element in it.
>>>>>
>>>>> So at the en of the MR, I should have an empty table.
>>>>>
>>>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>>>
>>>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>>>> table.
>>>>>
>>>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>>>> it. There is some scanner timeout exception for the run of the 100K
>>>>> rows.
>>>>>
>>>>> I'm running HBase 0.94.3.
>>>>>
>>>>> I will hava another 100K rows today, so I will re-run the job. I will
>>>>> increase the timeout to make sure I got no exception, but even when I
>>>>> ran the 38 lines with no exception one was remaining...
>>>>>
>>>>> Any idea why and where I can seach? It's not really an issue for me
>>>>> since I can just re-run the job, but this might be an issue for some
>>>>> others.
>>>>>
>>>>> JM
>>>>
>>>
>>

RE: MR missing lines

Posted by Anoop Sam John <an...@huawei.com>.

Jean:  just one thought after seeing the description and the code.. Not related to the missing as such

You want to delete the row fully right?
>My table is only one CF with one C with one version
And your code is like
>  Delete delete_entry_proposed = new Delete(key);
>  delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(), KVs.get(0).getQualifier());

deleteColumn() is useful when you want to delete specific column's specific version in a row.  In your case this may be really not needed. Just Delete delete_entry_proposed = new Delete(key);  may be enough so that the delete type is ROW delete.

You can see the javadoc of the deleteColumn() API in which it clearly says it is an expensive op. At the server side there will be a need to do a Get call..
In your case these are really unwanted over head .. I think...

-Anoop-
________________________________________
From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
Sent: Tuesday, December 18, 2012 7:07 PM
To: user@hbase.apache.org
Subject: Re: MR missing lines

I faced the issue again today...

RowCounter gave me 104313 lines
Here is the output of the job counters:
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311

There is a 2 lines difference between ROWS_PARSED and he counter.
ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
ENTRY_EXISTING are the 5 states an entry can have. Total of all those
counters is equal to the ROWS_PARSED value, so it's alligned. Code is
handling all the possibilities.

The ROWS_PARSED counter is incremented right at the beginning like
that (I removed the comments and javadoc for lisibility):
                /**
                 * The comments ...
                 */
                @Override
                public void map(ImmutableBytesWritable row__, Result values, Context
context) throws IOException
                {

                        context.getCounter(Counters.ROWS_PARSED).increment(1);
                        List<KeyValue> KVs = values.list();
                        try
                        {

                                // Get the current row.
                                byte[] key = values.getRow();

                                // First thing we do, we mark this line to be deleted.
                                Delete delete_entry_proposed = new Delete(key);
                                delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
KVs.get(0).getQualifier());
                                deletes_entry_proposed.add(delete_entry_proposed);


The deletes_entry_proposed is a list of rows to delete. After each
call to the delete method, the number of remaining lines into this
list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
should be deleted correctly.

I re-ran the rowcounter after the job, and I still have ROWS=5971
lines into the table. I check all my "feeding process" and they are
all closed.

My table is only one CF with one C with one version.

I can guess that the remaining 5971 lines into the table is an error
on my side, but I'm not able to find where since all the counters are
matching. I will add one counter which will add all the entries in the
delete list before calling the delete method. This should match the
number of rows.

Again, I will re-feed the table today with fresh data and re-run the job...

JM

2012/12/17, Jean-Marc Spaggiari <je...@spaggiari.org>:
> The job run the morning, and of course, this time, all the rows got
> processed ;)
>
> So I will give it few other tries and will keep you posted if I'm able
> to reproduce that again.
>
> Thanks,
>
> JM
>
> 2012/12/16, Jean-Marc Spaggiari <je...@spaggiari.org>:
>> Thanks for the suggestions.
>>
>> I already have logs to display all the exepctions and there is
>> nothing. I can't display the work done, there is to much :(
>>
>> I have counters "counting" the rows processed and they match what is
>> done, minus what is not processed. I have just added few other
>> counters. One right at the beginning, and one to count what are the
>> records remaining on the delete list, as suggested.
>>
>> I will run the job again tomorrow, see the result and keep you posted.
>>
>> JM
>>
>>
>> 2012/12/16, Asaf Mesika <as...@gmail.com>:
>>> Did you check the returned array of the delete method to make sure all
>>> records sent for delete have been deleted?
>>>
>>> Sent from my iPhone
>>>
>>> On 16 ���� 2012, at 14:52, Jean-Marc Spaggiari <je...@spaggiari.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>>>
>>>> When the target is reached, all the feeding process are stopped.
>>>>
>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>> and ran the MR.
>>>>
>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>> In the clean method, the list is sent to the table if there is any
>>>> element in it.
>>>>
>>>> So at the en of the MR, I should have an empty table.
>>>>
>>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>>
>>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>>> table.
>>>>
>>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>>> it. There is some scanner timeout exception for the run of the 100K
>>>> rows.
>>>>
>>>> I'm running HBase 0.94.3.
>>>>
>>>> I will hava another 100K rows today, so I will re-run the job. I will
>>>> increase the timeout to make sure I got no exception, but even when I
>>>> ran the 38 lines with no exception one was remaining...
>>>>
>>>> Any idea why and where I can seach? It's not really an issue for me
>>>> since I can just re-run the job, but this might be an issue for some
>>>> others.
>>>>
>>>> JM
>>>
>>
>

Re: MR missing lines

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

I faced the issue again today...

RowCounter gave me 104313 lines
Here is the output of the job counters:
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311

There is a 2 lines difference between ROWS_PARSED and he counter.
ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
ENTRY_EXISTING are the 5 states an entry can have. Total of all those
counters is equal to the ROWS_PARSED value, so it's alligned. Code is
handling all the possibilities.

The ROWS_PARSED counter is incremented right at the beginning like
that (I removed the comments and javadoc for lisibility):
		/**
		 * The comments ...
		 */
		@Override
		public void map(ImmutableBytesWritable row__, Result values, Context
context) throws IOException
		{
			
			context.getCounter(Counters.ROWS_PARSED).increment(1);
			List<KeyValue> KVs = values.list();
			try
			{

				// Get the current row.
				byte[] key = values.getRow();

				// First thing we do, we mark this line to be deleted.
				Delete delete_entry_proposed = new Delete(key);
				delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
KVs.get(0).getQualifier());
				deletes_entry_proposed.add(delete_entry_proposed);


The deletes_entry_proposed is a list of rows to delete. After each
call to the delete method, the number of remaining lines into this
list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
should be deleted correctly.

I re-ran the rowcounter after the job, and I still have ROWS=5971
lines into the table. I check all my "feeding process" and they are
all closed.

My table is only one CF with one C with one version.

I can guess that the remaining 5971 lines into the table is an error
on my side, but I'm not able to find where since all the counters are
matching. I will add one counter which will add all the entries in the
delete list before calling the delete method. This should match the
number of rows.

Again, I will re-feed the table today with fresh data and re-run the job...

JM

2012/12/17, Jean-Marc Spaggiari <je...@spaggiari.org>:
> The job run the morning, and of course, this time, all the rows got
> processed ;)
>
> So I will give it few other tries and will keep you posted if I'm able
> to reproduce that again.
>
> Thanks,
>
> JM
>
> 2012/12/16, Jean-Marc Spaggiari <je...@spaggiari.org>:
>> Thanks for the suggestions.
>>
>> I already have logs to display all the exepctions and there is
>> nothing. I can't display the work done, there is to much :(
>>
>> I have counters "counting" the rows processed and they match what is
>> done, minus what is not processed. I have just added few other
>> counters. One right at the beginning, and one to count what are the
>> records remaining on the delete list, as suggested.
>>
>> I will run the job again tomorrow, see the result and keep you posted.
>>
>> JM
>>
>>
>> 2012/12/16, Asaf Mesika <as...@gmail.com>:
>>> Did you check the returned array of the delete method to make sure all
>>> records sent for delete have been deleted?
>>>
>>> Sent from my iPhone
>>>
>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <je...@spaggiari.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>>>
>>>> When the target is reached, all the feeding process are stopped.
>>>>
>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>> and ran the MR.
>>>>
>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>> In the clean method, the list is sent to the table if there is any
>>>> element in it.
>>>>
>>>> So at the en of the MR, I should have an empty table.
>>>>
>>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>>
>>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>>> table.
>>>>
>>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>>> it. There is some scanner timeout exception for the run of the 100K
>>>> rows.
>>>>
>>>> I'm running HBase 0.94.3.
>>>>
>>>> I will hava another 100K rows today, so I will re-run the job. I will
>>>> increase the timeout to make sure I got no exception, but even when I
>>>> ran the 38 lines with no exception one was remaining...
>>>>
>>>> Any idea why and where I can seach? It's not really an issue for me
>>>> since I can just re-run the job, but this might be an issue for some
>>>> others.
>>>>
>>>> JM
>>>
>>
>

Re: MR missing lines

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

The job run the morning, and of course, this time, all the rows got processed ;)

So I will give it few other tries and will keep you posted if I'm able
to reproduce that again.

Thanks,

JM

2012/12/16, Jean-Marc Spaggiari <je...@spaggiari.org>:
> Thanks for the suggestions.
>
> I already have logs to display all the exepctions and there is
> nothing. I can't display the work done, there is to much :(
>
> I have counters "counting" the rows processed and they match what is
> done, minus what is not processed. I have just added few other
> counters. One right at the beginning, and one to count what are the
> records remaining on the delete list, as suggested.
>
> I will run the job again tomorrow, see the result and keep you posted.
>
> JM
>
>
> 2012/12/16, Asaf Mesika <as...@gmail.com>:
>> Did you check the returned array of the delete method to make sure all
>> records sent for delete have been deleted?
>>
>> Sent from my iPhone
>>
>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>>
>>> When the target is reached, all the feeding process are stopped.
>>>
>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>> and ran the MR.
>>>
>>> For each line, the MR is creating a delete. The delete is placed on a
>>> list, and when the list reached 10 elements, it's sent to the table.
>>> In the clean method, the list is sent to the table if there is any
>>> element in it.
>>>
>>> So at the en of the MR, I should have an empty table.
>>>
>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>
>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>> table.
>>>
>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>> it. There is some scanner timeout exception for the run of the 100K
>>> rows.
>>>
>>> I'm running HBase 0.94.3.
>>>
>>> I will hava another 100K rows today, so I will re-run the job. I will
>>> increase the timeout to make sure I got no exception, but even when I
>>> ran the 38 lines with no exception one was remaining...
>>>
>>> Any idea why and where I can seach? It's not really an issue for me
>>> since I can just re-run the job, but this might be an issue for some
>>> others.
>>>
>>> JM
>>
>

Re: MR missing lines

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Thanks for the suggestions.

I already have logs to display all the exepctions and there is
nothing. I can't display the work done, there is to much :(

I have counters "counting" the rows processed and they match what is
done, minus what is not processed. I have just added few other
counters. One right at the beginning, and one to count what are the
records remaining on the delete list, as suggested.

I will run the job again tomorrow, see the result and keep you posted.

JM


2012/12/16, Asaf Mesika <as...@gmail.com>:
> Did you check the returned array of the delete method to make sure all
> records sent for delete have been deleted?
>
> Sent from my iPhone
>
> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
>> Hi,
>>
>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>
>> When the target is reached, all the feeding process are stopped.
>>
>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>> and ran the MR.
>>
>> For each line, the MR is creating a delete. The delete is placed on a
>> list, and when the list reached 10 elements, it's sent to the table.
>> In the clean method, the list is sent to the table if there is any
>> element in it.
>>
>> So at the en of the MR, I should have an empty table.
>>
>> The table is splitted over 128 regions. And I have 8 region servers.
>>
>> What is disturbing me is that after the MR, I had 38 lines remaining
>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>> which this time took 2 minutes, and now I have 1 row remaining in the
>> table.
>>
>> I looked at the logs (for the 38 lines run) and there is nothing in
>> it. There is some scanner timeout exception for the run of the 100K
>> rows.
>>
>> I'm running HBase 0.94.3.
>>
>> I will hava another 100K rows today, so I will re-run the job. I will
>> increase the timeout to make sure I got no exception, but even when I
>> ran the 38 lines with no exception one was remaining...
>>
>> Any idea why and where I can seach? It's not really an issue for me
>> since I can just re-run the job, but this might be an issue for some
>> others.
>>
>> JM
>

Re: MR missing lines

Posted by Asaf Mesika <as...@gmail.com>.

Did you check the returned array of the delete method to make sure all
records sent for delete have been deleted?

Sent from my iPhone

On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi,
>
> I have a table where I'm running MR each time is exceding 100 000 rows.
>
> When the target is reached, all the feeding process are stopped.
>
> Yesterday it reached 123608 rows. So I stopped the feeding process,
> and ran the MR.
>
> For each line, the MR is creating a delete. The delete is placed on a
> list, and when the list reached 10 elements, it's sent to the table.
> In the clean method, the list is sent to the table if there is any
> element in it.
>
> So at the en of the MR, I should have an empty table.
>
> The table is splitted over 128 regions. And I have 8 region servers.
>
> What is disturbing me is that after the MR, I had 38 lines remaining
> on the table. the MR took 348 minutes to run. So I ran the MR again,
> which this time took 2 minutes, and now I have 1 row remaining in the
> table.
>
> I looked at the logs (for the 38 lines run) and there is nothing in
> it. There is some scanner timeout exception for the run of the 100K
> rows.
>
> I'm running HBase 0.94.3.
>
> I will hava another 100K rows today, so I will re-run the job. I will
> increase the timeout to make sure I got no exception, but even when I
> ran the 38 lines with no exception one was remaining...
>
> Any idea why and where I can seach? It's not really an issue for me
> since I can just re-run the job, but this might be an issue for some
> others.
>
> JM