You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Dru Jensen <dr...@gmail.com> on 2008/09/05 21:03:16 UTC
missing rows in MR process
hbase-users,
I have two MR processes that run one right after the other in a
script. The first reads from a file and populates a table. The
second uses a TableMap over that table that was just populated.
The first MR process inserted 1950 rows successfully and everything
looked correct. For some reason the second MR process only got 76
rows as input. I ran the exact same MR process and the second time it
got all 1950 rows.
Is there some time delay between the MR batch update of the first
process and the scan of the second? How can i make sure this commit
is complete before launching the second MR process?
This is using the Release Candidate 0.2.1 running on Hadoop 0.17.2.1.
thanks,
Dru
Re: missing rows in MR process
Posted by Dru Jensen <dr...@gmail.com>.
Aaahh Yes. Thanks.
On Sep 8, 2008, at 10:59 AM, stack wrote:
> Dru Jensen wrote:
>> Hi StAck,
>>
>> No, i don't think I'm hitting this. The first MR process is using
>> in: SequenceInputFileFormat out: TableReduce. The second is using
>> in: TableMap out TableReduce. I don't think the out-of-the-box
>> TableMap is using a filter, correct?
>>
> It looks like it is.
>
> The TableMap job makes a task per region by default. Each task then
> runs a scanner whose compass is defined by the region start/end
> row. When you get a scanner specifying a start/end row, it
> eventually does the following:
>
> public Scanner getScanner(final byte [][] columns,
> final byte [] startRow, final byte [] stopRow, final long timestamp)
> throws IOException {
> return getScanner(columns, startRow, timestamp,
> new WhileMatchRowFilter(new StopRowFilter(stopRow)));
> }
>
> ... i.e. put in place a StopRowFilter.
>
> So, maybe you are tripping over 856.
>
> St.Ack
>
>
>> Dru
>>
>> On Sep 5, 2008, at 3:59 PM, stack wrote:
>>
>>> This is odd Dru. Do you think you are seeing https://issues.apache.org/jira/browse/HBASE-856?
>>> Are you using filters?
>>> St.Ack
>>>
>>>
>>> Dru Jensen wrote:
>>>> hbase-users,
>>>>
>>>> I have two MR processes that run one right after the other in a
>>>> script. The first reads from a file and populates a table. The
>>>> second uses a TableMap over that table that was just populated.
>>>>
>>>> The first MR process inserted 1950 rows successfully and
>>>> everything looked correct. For some reason the second MR process
>>>> only got 76 rows as input. I ran the exact same MR process and
>>>> the second time it got all 1950 rows.
>>>>
>>>> Is there some time delay between the MR batch update of the first
>>>> process and the scan of the second? How can i make sure this
>>>> commit is complete before launching the second MR process?
>>>>
>>>> This is using the Release Candidate 0.2.1 running on Hadoop
>>>> 0.17.2.1.
>>>>
>>>> thanks,
>>>> Dru
>>>>
>>>>
>>>
>>
>
Re: missing rows in MR process
Posted by stack <st...@duboce.net>.
Dru Jensen wrote:
> Hi StAck,
>
> No, i don't think I'm hitting this. The first MR process is using in:
> SequenceInputFileFormat out: TableReduce. The second is using in:
> TableMap out TableReduce. I don't think the out-of-the-box TableMap
> is using a filter, correct?
>
It looks like it is.
The TableMap job makes a task per region by default. Each task then
runs a scanner whose compass is defined by the region start/end row.
When you get a scanner specifying a start/end row, it eventually does
the following:
public Scanner getScanner(final byte [][] columns,
final byte [] startRow, final byte [] stopRow, final long timestamp)
throws IOException {
return getScanner(columns, startRow, timestamp,
new WhileMatchRowFilter(new StopRowFilter(stopRow)));
}
... i.e. put in place a StopRowFilter.
So, maybe you are tripping over 856.
St.Ack
> Dru
>
> On Sep 5, 2008, at 3:59 PM, stack wrote:
>
>> This is odd Dru. Do you think you are seeing
>> https://issues.apache.org/jira/browse/HBASE-856? Are you using filters?
>> St.Ack
>>
>>
>> Dru Jensen wrote:
>>> hbase-users,
>>>
>>> I have two MR processes that run one right after the other in a
>>> script. The first reads from a file and populates a table. The
>>> second uses a TableMap over that table that was just populated.
>>>
>>> The first MR process inserted 1950 rows successfully and everything
>>> looked correct. For some reason the second MR process only got 76
>>> rows as input. I ran the exact same MR process and the second time
>>> it got all 1950 rows.
>>>
>>> Is there some time delay between the MR batch update of the first
>>> process and the scan of the second? How can i make sure this commit
>>> is complete before launching the second MR process?
>>>
>>> This is using the Release Candidate 0.2.1 running on Hadoop 0.17.2.1.
>>>
>>> thanks,
>>> Dru
>>>
>>>
>>
>
Re: missing rows in MR process
Posted by Dru Jensen <dr...@gmail.com>.
Hi StAck,
No, i don't think I'm hitting this. The first MR process is using in:
SequenceInputFileFormat out: TableReduce. The second is using in:
TableMap out TableReduce. I don't think the out-of-the-box TableMap
is using a filter, correct?
Dru
On Sep 5, 2008, at 3:59 PM, stack wrote:
> This is odd Dru. Do you think you are seeing https://issues.apache.org/jira/browse/HBASE-856?
> Are you using filters?
> St.Ack
>
>
> Dru Jensen wrote:
>> hbase-users,
>>
>> I have two MR processes that run one right after the other in a
>> script. The first reads from a file and populates a table. The
>> second uses a TableMap over that table that was just populated.
>>
>> The first MR process inserted 1950 rows successfully and everything
>> looked correct. For some reason the second MR process only got 76
>> rows as input. I ran the exact same MR process and the second time
>> it got all 1950 rows.
>>
>> Is there some time delay between the MR batch update of the first
>> process and the scan of the second? How can i make sure this
>> commit is complete before launching the second MR process?
>>
>> This is using the Release Candidate 0.2.1 running on Hadoop 0.17.2.1.
>>
>> thanks,
>> Dru
>>
>>
>
Re: missing rows in MR process
Posted by stack <st...@duboce.net>.
This is odd Dru. Do you think you are seeing
https://issues.apache.org/jira/browse/HBASE-856? Are you using filters?
St.Ack
Dru Jensen wrote:
> hbase-users,
>
> I have two MR processes that run one right after the other in a
> script. The first reads from a file and populates a table. The
> second uses a TableMap over that table that was just populated.
>
> The first MR process inserted 1950 rows successfully and everything
> looked correct. For some reason the second MR process only got 76
> rows as input. I ran the exact same MR process and the second time it
> got all 1950 rows.
>
> Is there some time delay between the MR batch update of the first
> process and the scan of the second? How can i make sure this commit
> is complete before launching the second MR process?
>
> This is using the Release Candidate 0.2.1 running on Hadoop 0.17.2.1.
>
> thanks,
> Dru
>
>