You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by kiran chitturi <ch...@gmail.com> on 2013/02/10 01:14:54 UTC

Inconsistent row count between mapreduce and shell count

Hi!

I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.

When i execute hbase count over a table in a shell, i got the count of
2152416 rows.

When i did the same thing using the rowcounter mapreduce, i got the value
as below

org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991

Same thing happened when i used pig to count or do operations. There is
inconsistency between both the results.

During the mapreduce, i have noticed that there are 5 tasks that are
killed. When i tried to trace back to the tasktracker logs of the node it
shows similar to below log.

2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
with ID: jvm_201302090035_0015_m_1905604998 given task:
attempt_201302090035_0015_m_000012_1
2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker: About
to purge task: attempt_201302090035_0015_m_000012_1
2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree: Killing
process group9745 with signal TERM. Exit code 0

I have also tried to run the tool 'hbck' but it shows no inconsistencies.

Can you please suggest me why there is inconsistency and how can i correct
it ?

Thanks,
-- 
Kiran Chitturi

Re: Inconsistent row count between mapreduce and shell count

Posted by Ted Yu <yu...@gmail.com>.

Kiran:
Take a look at src/main/ruby/shell/commands/move.rb

You would see help on how to move region.

Cheers

On Sat, Feb 9, 2013 at 9:46 PM, kiran chitturi <ch...@gmail.com>wrote:

> Many Thanks Lars for your suggestions! I have added them to the command
>
> /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3"
> -Dhbase.client.scanner.caching=1000
> -Dmapred.map.tasks.speculative.execution=false documents
>
> I have stopped the datasources which write data in to the table but it did
> not work. There is not much difference in the rowCount mapreduce is
> showing.
>
> Though, the rowcount returned is presistent once i stopped writing data in
> to the table. ( I ran the command 3 times). The shell count is also same
> once i stopped writing.
>
> Since most of the rows are tweets, around 1.4 million rows are stored on a
> single data node.  (region server)
>
> Do you know of any way that i can reassign the regions in the table without
> losing the data ? Will it make a difference then ?
>
> Thank you,
> Kiran.
>
>
>
>
> On Sat, Feb 9, 2013 at 11:38 PM, lars hofhansl <la...@apache.org> wrote:
>
> > That looks all as it should.
> > Unless you somehow pointed the M/R job to another cluster I have no good
> > explanation.
> >
> >
> > Would be interesting to see whether in the absence of writes you'd always
> > get precisely the same numbers.
> > (Look like it might be the case, your 2nd run is not wildly different
> from
> > the first).
> >
> >
> > This is a bit disconcerting. Is there anything "interesting" in the logs?
> >
> >
> > Aside: For performance reasons you'd probably want to enable scanner
> > caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000)
> >
> > And also turn off speculative execution (we should do that by default):
> > -Dmapred.map.tasks.speculative.execution=false
> >
> > It might be the speculative execution that throws the job off, I am just
> > guessing now.
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: kiran chitturi <ch...@gmail.com>
> > To: user <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
> > Sent: Saturday, February 9, 2013 6:51 PM
> > Subject: Re: Inconsistent row count between mapreduce and shell count
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <la...@apache.org> wrote:
> >
> > Hmm... Can you show us the exact commands you executed?
> > >
> > >
> > I am writing below the exact commands that i have used.
> >
> > In the hbase shell, for the table documents i have used
> >    count 'documents'
> >
> > The mapreduce command is
> >     /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> > rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents
> >
> >
> > And just to rule out the obvious:
> > >1. There were no writes while you did the row count?
> > >
> >            Actually, we have a few automated programs which write tweets
> > to the table over time. So there might be writes when the row count is
> > there
> >            Should i disable writes when doing the mapreduce ?
> >
> > 2. In the RowCount M/R case you specified neither a range nor any
> columns?
> > >
> > >
> >     No
> >
> > >Do you always get the exact same numbers in both cases? Or do they vary?
> > >
> >    I just did another map reduce and this time the number is 1394234. The
> > actual count from shell is 2157447
> >
> > Thanks!
> >
> >
> > >
> > >----- Original Message -----
> > >From: kiran chitturi <ch...@gmail.com>
> > >To: user <us...@hbase.apache.org>
> > >Cc:
> > >Sent: Saturday, February 9, 2013 4:49 PM
> > >Subject: Re: Inconsistent row count between mapreduce and shell count
> > >
> > >Yes. I just counted the number of regions in '
> > >http://machine1:60010/table.jsp?name=documents'; and the count is 53
> > which
> > >is equal to the number of complete tasks in hadoop.
> > >
> > >
> > >Thanks,
> > >Kiran.
> > >
> > >
> > >On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Apart from the 5 killed tasks, was the number of successful tasks
> equal
> > to
> > >> the number of regions in your table ?
> > >>
> > >> Thanks
> > >>
> > >> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <
> > chitturikiran15@gmail.com
> > >> >wrote:
> > >>
> > >> > Hi!
> > >> >
> > >> > I am using Hbase 0.94.1 version over a distributed cluster of 20
> > nodes.
> > >> >
> > >> > When i execute hbase count over a table in a shell, i got the count
> of
> > >> > 2152416 rows.
> > >> >
> > >> > When i did the same thing using the rowcounter mapreduce, i got the
> > value
> > >> > as below
> > >> >
> > >> >
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > >> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> > >> >
> > >> > Same thing happened when i used pig to count or do operations. There
> > is
> > >> > inconsistency between both the results.
> > >> >
> > >> > During the mapreduce, i have noticed that there are 5 tasks that are
> > >> > killed. When i tried to trace back to the tasktracker logs of the
> > node it
> > >> > shows similar to below log.
> > >> >
> > >> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker:
> > JVM
> > >> > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > >> > attempt_201302090035_0015_m_000012_1
> > >> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > >> > Received KillTaskAction for task:
> attempt_201302090035_0015_m_000012_1
> > >> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > >> About
> > >> > to purge task: attempt_201302090035_0015_m_000012_1
> > >> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> > >> Killing
> > >> > process group9745 with signal TERM. Exit code 0
> > >> >
> > >> > I have also tried to run the tool 'hbck' but it shows no
> > inconsistencies.
> > >> >
> > >> > Can you please suggest me why there is inconsistency and how can i
> > >> correct
> > >> > it ?
> > >> >
> > >> > Thanks,
> > >> > --
> > >> > Kiran Chitturi
> > >> >
> > >>
> > >
> > >
> > >
> > >--
> > >Kiran Chitturi
> > >
> > >
> >
> >
> > --
> >
> > Kiran Chitturi
> >
>
>
>
> --
> Kiran Chitturi
>

Re: Inconsistent row count between mapreduce and shell count

Posted by kiran chitturi <ch...@gmail.com>.

Many Thanks Lars for your suggestions! I have added them to the command

/opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3"
-Dhbase.client.scanner.caching=1000
-Dmapred.map.tasks.speculative.execution=false documents

I have stopped the datasources which write data in to the table but it did
not work. There is not much difference in the rowCount mapreduce is
showing.

Though, the rowcount returned is presistent once i stopped writing data in
to the table. ( I ran the command 3 times). The shell count is also same
once i stopped writing.

Since most of the rows are tweets, around 1.4 million rows are stored on a
single data node.  (region server)

Do you know of any way that i can reassign the regions in the table without
losing the data ? Will it make a difference then ?

Thank you,
Kiran.




On Sat, Feb 9, 2013 at 11:38 PM, lars hofhansl <la...@apache.org> wrote:

> That looks all as it should.
> Unless you somehow pointed the M/R job to another cluster I have no good
> explanation.
>
>
> Would be interesting to see whether in the absence of writes you'd always
> get precisely the same numbers.
> (Look like it might be the case, your 2nd run is not wildly different from
> the first).
>
>
> This is a bit disconcerting. Is there anything "interesting" in the logs?
>
>
> Aside: For performance reasons you'd probably want to enable scanner
> caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000)
>
> And also turn off speculative execution (we should do that by default):
> -Dmapred.map.tasks.speculative.execution=false
>
> It might be the speculative execution that throws the job off, I am just
> guessing now.
>
>
> -- Lars
>
> ________________________________
> From: kiran chitturi <ch...@gmail.com>
> To: user <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
> Sent: Saturday, February 9, 2013 6:51 PM
> Subject: Re: Inconsistent row count between mapreduce and shell count
>
>
>
>
>
>
>
> On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <la...@apache.org> wrote:
>
> Hmm... Can you show us the exact commands you executed?
> >
> >
> I am writing below the exact commands that i have used.
>
> In the hbase shell, for the table documents i have used
>    count 'documents'
>
> The mapreduce command is
>     /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents
>
>
> And just to rule out the obvious:
> >1. There were no writes while you did the row count?
> >
>            Actually, we have a few automated programs which write tweets
> to the table over time. So there might be writes when the row count is
> there
>            Should i disable writes when doing the mapreduce ?
>
> 2. In the RowCount M/R case you specified neither a range nor any columns?
> >
> >
>     No
>
> >Do you always get the exact same numbers in both cases? Or do they vary?
> >
>    I just did another map reduce and this time the number is 1394234. The
> actual count from shell is 2157447
>
> Thanks!
>
>
> >
> >----- Original Message -----
> >From: kiran chitturi <ch...@gmail.com>
> >To: user <us...@hbase.apache.org>
> >Cc:
> >Sent: Saturday, February 9, 2013 4:49 PM
> >Subject: Re: Inconsistent row count between mapreduce and shell count
> >
> >Yes. I just counted the number of regions in '
> >http://machine1:60010/table.jsp?name=documents'; and the count is 53
> which
> >is equal to the number of complete tasks in hadoop.
> >
> >
> >Thanks,
> >Kiran.
> >
> >
> >On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Apart from the 5 killed tasks, was the number of successful tasks equal
> to
> >> the number of regions in your table ?
> >>
> >> Thanks
> >>
> >> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <
> chitturikiran15@gmail.com
> >> >wrote:
> >>
> >> > Hi!
> >> >
> >> > I am using Hbase 0.94.1 version over a distributed cluster of 20
> nodes.
> >> >
> >> > When i execute hbase count over a table in a shell, i got the count of
> >> > 2152416 rows.
> >> >
> >> > When i did the same thing using the rowcounter mapreduce, i got the
> value
> >> > as below
> >> >
> >> > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> >> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> >> >
> >> > Same thing happened when i used pig to count or do operations. There
> is
> >> > inconsistency between both the results.
> >> >
> >> > During the mapreduce, i have noticed that there are 5 tasks that are
> >> > killed. When i tried to trace back to the tasktracker logs of the
> node it
> >> > shows similar to below log.
> >> >
> >> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker:
> JVM
> >> > with ID: jvm_201302090035_0015_m_1905604998 given task:
> >> > attempt_201302090035_0015_m_000012_1
> >> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> >> > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> >> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> >> About
> >> > to purge task: attempt_201302090035_0015_m_000012_1
> >> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> >> Killing
> >> > process group9745 with signal TERM. Exit code 0
> >> >
> >> > I have also tried to run the tool 'hbck' but it shows no
> inconsistencies.
> >> >
> >> > Can you please suggest me why there is inconsistency and how can i
> >> correct
> >> > it ?
> >> >
> >> > Thanks,
> >> > --
> >> > Kiran Chitturi
> >> >
> >>
> >
> >
> >
> >--
> >Kiran Chitturi
> >
> >
>
>
> --
>
> Kiran Chitturi
>



-- 
Kiran Chitturi

Re: Inconsistent row count between mapreduce and shell count

Posted by lars hofhansl <la...@apache.org>.

That looks all as it should.
Unless you somehow pointed the M/R job to another cluster I have no good explanation.


Would be interesting to see whether in the absence of writes you'd always get precisely the same numbers.
(Look like it might be the case, your 2nd run is not wildly different from the first).


This is a bit disconcerting. Is there anything "interesting" in the logs?


Aside: For performance reasons you'd probably want to enable scanner caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000)

And also turn off speculative execution (we should do that by default): -Dmapred.map.tasks.speculative.execution=false

It might be the speculative execution that throws the job off, I am just guessing now.


-- Lars

________________________________
From: kiran chitturi <ch...@gmail.com>
To: user <us...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
Sent: Saturday, February 9, 2013 6:51 PM
Subject: Re: Inconsistent row count between mapreduce and shell count







On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <la...@apache.org> wrote:

Hmm... Can you show us the exact commands you executed?
>
>
I am writing below the exact commands that i have used. 

In the hbase shell, for the table documents i have used 
   count 'documents'

The mapreduce command is 
    /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents


And just to rule out the obvious:
>1. There were no writes while you did the row count?
>
           Actually, we have a few automated programs which write tweets to the table over time. So there might be writes when the row count is there 
           Should i disable writes when doing the mapreduce ?

2. In the RowCount M/R case you specified neither a range nor any columns?
>
>
    No 

>Do you always get the exact same numbers in both cases? Or do they vary?
>
   I just did another map reduce and this time the number is 1394234. The actual count from shell is 2157447

Thanks!


>
>----- Original Message -----
>From: kiran chitturi <ch...@gmail.com>
>To: user <us...@hbase.apache.org>
>Cc:
>Sent: Saturday, February 9, 2013 4:49 PM
>Subject: Re: Inconsistent row count between mapreduce and shell count
>
>Yes. I just counted the number of regions in '
>http://machine1:60010/table.jsp?name=documents'; and the count is 53 which
>is equal to the number of complete tasks in hadoop.
>
>
>Thanks,
>Kiran.
>
>
>On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Apart from the 5 killed tasks, was the number of successful tasks equal to
>> the number of regions in your table ?
>>
>> Thanks
>>
>> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <chitturikiran15@gmail.com
>> >wrote:
>>
>> > Hi!
>> >
>> > I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
>> >
>> > When i execute hbase count over a table in a shell, i got the count of
>> > 2152416 rows.
>> >
>> > When i did the same thing using the rowcounter mapreduce, i got the value
>> > as below
>> >
>> > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
>> >
>> > Same thing happened when i used pig to count or do operations. There is
>> > inconsistency between both the results.
>> >
>> > During the mapreduce, i have noticed that there are 5 tasks that are
>> > killed. When i tried to trace back to the tasktracker logs of the node it
>> > shows similar to below log.
>> >
>> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
>> > with ID: jvm_201302090035_0015_m_1905604998 given task:
>> > attempt_201302090035_0015_m_000012_1
>> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
>> > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
>> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
>> About
>> > to purge task: attempt_201302090035_0015_m_000012_1
>> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
>> Killing
>> > process group9745 with signal TERM. Exit code 0
>> >
>> > I have also tried to run the tool 'hbck' but it shows no inconsistencies.
>> >
>> > Can you please suggest me why there is inconsistency and how can i
>> correct
>> > it ?
>> >
>> > Thanks,
>> > --
>> > Kiran Chitturi
>> >
>>
>
>
>
>--
>Kiran Chitturi
>
>


-- 

Kiran Chitturi

Re: Inconsistent row count between mapreduce and shell count

Posted by kiran chitturi <ch...@gmail.com>.

On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <la...@apache.org> wrote:

> Hmm... Can you show us the exact commands you executed?
>
> I am writing below the exact commands that i have used.

In the hbase shell, for the table documents i have used
   count 'documents'

The mapreduce command is
    /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents



> And just to rule out the obvious:
> 1. There were no writes while you did the row count?
>
           Actually, we have a few automated programs which write tweets to
the table over time. So there might be writes when the row count is there
           Should i disable writes when doing the mapreduce ?

2. In the RowCount M/R case you specified neither a range nor any columns?
>
>     No

>
> Do you always get the exact same numbers in both cases? Or do they vary?
>
   I just did another map reduce and this time the number is 1394234. The
actual count from shell is 2157447

Thanks!


>
> ----- Original Message -----
> From: kiran chitturi <ch...@gmail.com>
> To: user <us...@hbase.apache.org>
> Cc:
> Sent: Saturday, February 9, 2013 4:49 PM
> Subject: Re: Inconsistent row count between mapreduce and shell count
>
> Yes. I just counted the number of regions in '
> http://machine1:60010/table.jsp?name=documents' and the count is 53 which
> is equal to the number of complete tasks in hadoop.
>
>
> Thanks,
> Kiran.
>
>
> On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Apart from the 5 killed tasks, was the number of successful tasks equal
> to
> > the number of regions in your table ?
> >
> > Thanks
> >
> > On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <
> chitturikiran15@gmail.com
> > >wrote:
> >
> > > Hi!
> > >
> > > I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
> > >
> > > When i execute hbase count over a table in a shell, i got the count of
> > > 2152416 rows.
> > >
> > > When i did the same thing using the rowcounter mapreduce, i got the
> value
> > > as below
> > >
> > > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> > >
> > > Same thing happened when i used pig to count or do operations. There is
> > > inconsistency between both the results.
> > >
> > > During the mapreduce, i have noticed that there are 5 tasks that are
> > > killed. When i tried to trace back to the tasktracker logs of the node
> it
> > > shows similar to below log.
> > >
> > > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker:
> JVM
> > > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > > attempt_201302090035_0015_m_000012_1
> > > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> > > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > About
> > > to purge task: attempt_201302090035_0015_m_000012_1
> > > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> > Killing
> > > process group9745 with signal TERM. Exit code 0
> > >
> > > I have also tried to run the tool 'hbck' but it shows no
> inconsistencies.
> > >
> > > Can you please suggest me why there is inconsistency and how can i
> > correct
> > > it ?
> > >
> > > Thanks,
> > > --
> > > Kiran Chitturi
> > >
> >
>
>
>
> --
> Kiran Chitturi
>
>


-- 
Kiran Chitturi

Re: Inconsistent row count between mapreduce and shell count

Posted by lars hofhansl <la...@apache.org>.

Hmm... Can you show us the exact commands you executed?

And just to rule out the obvious:
1. There were no writes while you did the row count?
2. In the RowCount M/R case you specified neither a range nor any columns?


Do you always get the exact same numbers in both cases? Or do they vary?

Thanks.

-- Lars


----- Original Message -----
From: kiran chitturi <ch...@gmail.com>
To: user <us...@hbase.apache.org>
Cc: 
Sent: Saturday, February 9, 2013 4:49 PM
Subject: Re: Inconsistent row count between mapreduce and shell count

Yes. I just counted the number of regions in '
http://machine1:60010/table.jsp?name=documents' and the count is 53 which
is equal to the number of complete tasks in hadoop.


Thanks,
Kiran.


On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yu...@gmail.com> wrote:

> Apart from the 5 killed tasks, was the number of successful tasks equal to
> the number of regions in your table ?
>
> Thanks
>
> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <chitturikiran15@gmail.com
> >wrote:
>
> > Hi!
> >
> > I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
> >
> > When i execute hbase count over a table in a shell, i got the count of
> > 2152416 rows.
> >
> > When i did the same thing using the rowcounter mapreduce, i got the value
> > as below
> >
> > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> >
> > Same thing happened when i used pig to count or do operations. There is
> > inconsistency between both the results.
> >
> > During the mapreduce, i have noticed that there are 5 tasks that are
> > killed. When i tried to trace back to the tasktracker logs of the node it
> > shows similar to below log.
> >
> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
> > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> About
> > to purge task: attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> Killing
> > process group9745 with signal TERM. Exit code 0
> >
> > I have also tried to run the tool 'hbck' but it shows no inconsistencies.
> >
> > Can you please suggest me why there is inconsistency and how can i
> correct
> > it ?
> >
> > Thanks,
> > --
> > Kiran Chitturi
> >
>



-- 
Kiran Chitturi

Re: Inconsistent row count between mapreduce and shell count

Posted by kiran chitturi <ch...@gmail.com>.

Yes. I just counted the number of regions in '
http://machine1:60010/table.jsp?name=documents' and the count is 53 which
is equal to the number of complete tasks in hadoop.


Thanks,
Kiran.


On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yu...@gmail.com> wrote:

> Apart from the 5 killed tasks, was the number of successful tasks equal to
> the number of regions in your table ?
>
> Thanks
>
> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <chitturikiran15@gmail.com
> >wrote:
>
> > Hi!
> >
> > I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
> >
> > When i execute hbase count over a table in a shell, i got the count of
> > 2152416 rows.
> >
> > When i did the same thing using the rowcounter mapreduce, i got the value
> > as below
> >
> > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> >
> > Same thing happened when i used pig to count or do operations. There is
> > inconsistency between both the results.
> >
> > During the mapreduce, i have noticed that there are 5 tasks that are
> > killed. When i tried to trace back to the tasktracker logs of the node it
> > shows similar to below log.
> >
> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
> > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> About
> > to purge task: attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> Killing
> > process group9745 with signal TERM. Exit code 0
> >
> > I have also tried to run the tool 'hbck' but it shows no inconsistencies.
> >
> > Can you please suggest me why there is inconsistency and how can i
> correct
> > it ?
> >
> > Thanks,
> > --
> > Kiran Chitturi
> >
>



-- 
Kiran Chitturi

Re: Inconsistent row count between mapreduce and shell count

Posted by Ted Yu <yu...@gmail.com>.

Apart from the 5 killed tasks, was the number of successful tasks equal to
the number of regions in your table ?

Thanks

On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <ch...@gmail.com>wrote:

> Hi!
>
> I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
>
> When i execute hbase count over a table in a shell, i got the count of
> 2152416 rows.
>
> When i did the same thing using the rowcounter mapreduce, i got the value
> as below
>
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
>
> Same thing happened when i used pig to count or do operations. There is
> inconsistency between both the results.
>
> During the mapreduce, i have noticed that there are 5 tasks that are
> killed. When i tried to trace back to the tasktracker logs of the node it
> shows similar to below log.
>
> 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
> with ID: jvm_201302090035_0015_m_1905604998 given task:
> attempt_201302090035_0015_m_000012_1
> 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker: About
> to purge task: attempt_201302090035_0015_m_000012_1
> 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree: Killing
> process group9745 with signal TERM. Exit code 0
>
> I have also tried to run the tool 'hbck' but it shows no inconsistencies.
>
> Can you please suggest me why there is inconsistency and how can i correct
> it ?
>
> Thanks,
> --
> Kiran Chitturi
>