You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stuti Awasthi <st...@hcl.com> on 2011/11/16 08:41:03 UTC

Facing Issues with RowCounter

Hi,
I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.

Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter Keyword
Output :
11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
11/11/16 13:04:32 INFO mapred.JobClient:   org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2411923
11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0

Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set

Output :
11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2411939
11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0

Table Describe command Output is :
TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}

Am I executing in wrong way or this is some bug ?

Regards,
Stuti Awasthi
HCL Comnet Systems and Services Ltd
F-8/9 Basement, Sec-3,Noida.


________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

RE: Facing Issues with RowCounter

Posted by Stuti Awasthi <st...@hcl.com>.
Ok. 
Thanks for update. Il check the patch else I can write my own MR for row count.

Cheers
Stuti

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Friday, November 18, 2011 3:37 AM
To: user@hbase.apache.org
Subject: Re: Facing Issues with RowCounter

Ah! Took me a moment to figure it out, it's:

https://issues.apache.org/jira/browse/HBASE-4295 "rowcounter does not return the correct number of rows in certain circumstances"

What made me think about it is that your counters do say that rows were taken into input, but none counted because the values are empty.
That was the problem in 4295.

The patch is currently only in the tip of the 0.90 branch, so unless you patch it yourself you'll have to wait for 0.90.5 (which may or may not get released, depends if someone wants to do it).

J-D

On Wed, Nov 16, 2011 at 9:27 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi JD,
>
> Table 'Keyword' contains 'Set' column family with 7 rows. Here is the output of scan :
>
> hbase(main):001:0> scan 'Keyword',{COLUMNS=>['Set']} ROW                                
> COLUMN+CELL
>  Apache                            column=Set:Fuse, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:Hadoop, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:Hive, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:MySql, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:PHP, 
> timestamp=1321506922206, value=
>  Fuse                                column=Set:Apache, 
> timestamp=1321506922206, value=
>  Fuse                                column=Set:Hdfs, 
> timestamp=1321506922209, value=
>  Hadoop                            column=Set:Apache, 
> timestamp=1321506922209, value=
>  Hadoop                            column=Set:Hive, 
> timestamp=1321506922212, value=
>  Hdfs                              column=Set:Fuse, 
> timestamp=1321506922212, value=
>  Hive                              column=Set:Apache, 
> timestamp=1321506922212, value=
>  Hive                              column=Set:Hadoop, 
> timestamp=1321506922214, value=
>  MySql                             column=Set:Apache, 
> timestamp=1321506922214, value=
>  MySql                             column=Set:PHP, 
> timestamp=1321506922216, value=
>  PHP                               column=Set:Apache, 
> timestamp=1321506922216, value=
>  PHP                               column=Set:MySql, 
> timestamp=1321506922218, value=
> 7 row(s) in 0.4120 seconds
>
> This output is not shown in RowCounter MR job.
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of 
> Jean-Daniel Cryans
> Sent: Wednesday, November 16, 2011 11:09 PM
> To: user@hbase.apache.org
> Subject: Re: Facing Issues with RowCounter
>
> What I can decrypt from those outputs is that you have a total of 7 rows, and none of them have data in the "Set" column family. Is it the case or not? Without more info from you, it's hard to tell.
>
> J-D
>
> On Tue, Nov 15, 2011 at 11:41 PM, Stuti Awasthi <st...@hcl.com> wrote:
>> Hi,
>> I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.
>>
>> Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase 
>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Output :
>> 11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>> 11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
>> 11/11/16 13:04:32 INFO mapred.JobClient:
>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counter
>> s
>> 11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
>> 11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
>> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
>> 11/11/16 13:04:32 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=2411923
>> 11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0
>>
>> Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase 
>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set
>>
>> Output :
>> 11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>> 11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
>> 11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
>> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
>> 11/11/16 13:05:34 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=2411939
>> 11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0
>>
>> Table Describe command Output is :
>> TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info', 
>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set', 
>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>>
>> Am I executing in wrong way or this is some bug ?
>>
>> Regards,
>> Stuti Awasthi
>> HCL Comnet Systems and Services Ltd
>> F-8/9 Basement, Sec-3,Noida.
>>
>>
>> ________________________________
>> ::DISCLAIMER::
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its 
>> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure, 
>> modification, distribution and / or publication of this message 
>> without the prior written consent of the author of this e-mail is 
>> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>

Re: Facing Issues with RowCounter

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Awesome! Thanks for the feedback!

J-D

On Thu, Nov 17, 2011 at 11:07 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi JD,
> I have applied the patch and tested it also, its working fine now. :) Thanks
>
> -----Original Message-----
> From: Stuti Awasthi
> Sent: Friday, November 18, 2011 11:27 AM
> To: user@hbase.apache.org
> Subject: RE: Facing Issues with RowCounter
>
> Ok.
> Thanks for update. Il check the patch else I can write my own MR for row count.
>
> Cheers
> Stuti
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Friday, November 18, 2011 3:37 AM
> To: user@hbase.apache.org
> Subject: Re: Facing Issues with RowCounter
>
> Ah! Took me a moment to figure it out, it's:
>
> https://issues.apache.org/jira/browse/HBASE-4295 "rowcounter does not return the correct number of rows in certain circumstances"
>
> What made me think about it is that your counters do say that rows were taken into input, but none counted because the values are empty.
> That was the problem in 4295.
>
> The patch is currently only in the tip of the 0.90 branch, so unless you patch it yourself you'll have to wait for 0.90.5 (which may or may not get released, depends if someone wants to do it).
>
> J-D
>
> On Wed, Nov 16, 2011 at 9:27 PM, Stuti Awasthi <st...@hcl.com> wrote:
>> Hi JD,
>>
>> Table 'Keyword' contains 'Set' column family with 7 rows. Here is the output of scan :
>>
>> hbase(main):001:0> scan 'Keyword',{COLUMNS=>['Set']} ROW
>> COLUMN+CELL
>>  Apache                            column=Set:Fuse,
>> timestamp=1321506922206, value=
>>  Apache                            column=Set:Hadoop,
>> timestamp=1321506922206, value=
>>  Apache                            column=Set:Hive,
>> timestamp=1321506922206, value=
>>  Apache                            column=Set:MySql,
>> timestamp=1321506922206, value=
>>  Apache                            column=Set:PHP,
>> timestamp=1321506922206, value=
>>  Fuse                                column=Set:Apache,
>> timestamp=1321506922206, value=
>>  Fuse                                column=Set:Hdfs,
>> timestamp=1321506922209, value=
>>  Hadoop                            column=Set:Apache,
>> timestamp=1321506922209, value=
>>  Hadoop                            column=Set:Hive,
>> timestamp=1321506922212, value=
>>  Hdfs                              column=Set:Fuse,
>> timestamp=1321506922212, value=
>>  Hive                              column=Set:Apache,
>> timestamp=1321506922212, value=
>>  Hive                              column=Set:Hadoop,
>> timestamp=1321506922214, value=
>>  MySql                             column=Set:Apache,
>> timestamp=1321506922214, value=
>>  MySql                             column=Set:PHP,
>> timestamp=1321506922216, value=
>>  PHP                               column=Set:Apache,
>> timestamp=1321506922216, value=
>>  PHP                               column=Set:MySql,
>> timestamp=1321506922218, value=
>> 7 row(s) in 0.4120 seconds
>>
>> This output is not shown in RowCounter MR job.
>>
>> -----Original Message-----
>> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
>> Jean-Daniel Cryans
>> Sent: Wednesday, November 16, 2011 11:09 PM
>> To: user@hbase.apache.org
>> Subject: Re: Facing Issues with RowCounter
>>
>> What I can decrypt from those outputs is that you have a total of 7 rows, and none of them have data in the "Set" column family. Is it the case or not? Without more info from you, it's hard to tell.
>>
>> J-D
>>
>> On Tue, Nov 15, 2011 at 11:41 PM, Stuti Awasthi <st...@hcl.com> wrote:
>>> Hi,
>>> I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.
>>>
>>> Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase
>>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Output :
>>> 11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>>> 11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
>>> 11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
>>> 11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
>>> 11/11/16 13:04:32 INFO mapred.JobClient:
>>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counter
>>> s
>>> 11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
>>> 11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
>>> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
>>> 11/11/16 13:04:32 INFO mapred.JobClient:
>>> FILE_BYTES_WRITTEN=2411923
>>> 11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
>>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
>>> 11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
>>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0
>>>
>>> Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase
>>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set
>>>
>>> Output :
>>> 11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>>> 11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
>>> 11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
>>> 11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
>>> 11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
>>> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
>>> 11/11/16 13:05:34 INFO mapred.JobClient:
>>> FILE_BYTES_WRITTEN=2411939
>>> 11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
>>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
>>> 11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
>>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0
>>>
>>> Table Describe command Output is :
>>> TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info',
>>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =>
>>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set',
>>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =>
>>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>>>
>>> Am I executing in wrong way or this is some bug ?
>>>
>>> Regards,
>>> Stuti Awasthi
>>> HCL Comnet Systems and Services Ltd
>>> F-8/9 Basement, Sec-3,Noida.
>>>
>>>
>>> ________________________________
>>> ::DISCLAIMER::
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
>>> It shall not attach any liability on the originator or HCL or its
>>> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
>>> Any form of reproduction, dissemination, copying, disclosure,
>>> modification, distribution and / or publication of this message
>>> without the prior written consent of the author of this e-mail is
>>> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>
>

RE: Facing Issues with RowCounter

Posted by Stuti Awasthi <st...@hcl.com>.
Hi JD,
I have applied the patch and tested it also, its working fine now. :) Thanks

-----Original Message-----
From: Stuti Awasthi 
Sent: Friday, November 18, 2011 11:27 AM
To: user@hbase.apache.org
Subject: RE: Facing Issues with RowCounter

Ok. 
Thanks for update. Il check the patch else I can write my own MR for row count.

Cheers
Stuti

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Friday, November 18, 2011 3:37 AM
To: user@hbase.apache.org
Subject: Re: Facing Issues with RowCounter

Ah! Took me a moment to figure it out, it's:

https://issues.apache.org/jira/browse/HBASE-4295 "rowcounter does not return the correct number of rows in certain circumstances"

What made me think about it is that your counters do say that rows were taken into input, but none counted because the values are empty.
That was the problem in 4295.

The patch is currently only in the tip of the 0.90 branch, so unless you patch it yourself you'll have to wait for 0.90.5 (which may or may not get released, depends if someone wants to do it).

J-D

On Wed, Nov 16, 2011 at 9:27 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi JD,
>
> Table 'Keyword' contains 'Set' column family with 7 rows. Here is the output of scan :
>
> hbase(main):001:0> scan 'Keyword',{COLUMNS=>['Set']} ROW
> COLUMN+CELL
>  Apache                            column=Set:Fuse, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:Hadoop, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:Hive, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:MySql, 
> timestamp=1321506922206, value=
>  Apache                            column=Set:PHP, 
> timestamp=1321506922206, value=
>  Fuse                                column=Set:Apache, 
> timestamp=1321506922206, value=
>  Fuse                                column=Set:Hdfs, 
> timestamp=1321506922209, value=
>  Hadoop                            column=Set:Apache, 
> timestamp=1321506922209, value=
>  Hadoop                            column=Set:Hive, 
> timestamp=1321506922212, value=
>  Hdfs                              column=Set:Fuse, 
> timestamp=1321506922212, value=
>  Hive                              column=Set:Apache, 
> timestamp=1321506922212, value=
>  Hive                              column=Set:Hadoop, 
> timestamp=1321506922214, value=
>  MySql                             column=Set:Apache, 
> timestamp=1321506922214, value=
>  MySql                             column=Set:PHP, 
> timestamp=1321506922216, value=
>  PHP                               column=Set:Apache, 
> timestamp=1321506922216, value=
>  PHP                               column=Set:MySql, 
> timestamp=1321506922218, value=
> 7 row(s) in 0.4120 seconds
>
> This output is not shown in RowCounter MR job.
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of 
> Jean-Daniel Cryans
> Sent: Wednesday, November 16, 2011 11:09 PM
> To: user@hbase.apache.org
> Subject: Re: Facing Issues with RowCounter
>
> What I can decrypt from those outputs is that you have a total of 7 rows, and none of them have data in the "Set" column family. Is it the case or not? Without more info from you, it's hard to tell.
>
> J-D
>
> On Tue, Nov 15, 2011 at 11:41 PM, Stuti Awasthi <st...@hcl.com> wrote:
>> Hi,
>> I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.
>>
>> Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase 
>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Output :
>> 11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>> 11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
>> 11/11/16 13:04:32 INFO mapred.JobClient:
>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counter
>> s
>> 11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
>> 11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
>> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
>> 11/11/16 13:04:32 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=2411923
>> 11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0
>>
>> Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase 
>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set
>>
>> Output :
>> 11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>> 11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
>> 11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
>> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
>> 11/11/16 13:05:34 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=2411939
>> 11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0
>>
>> Table Describe command Output is :
>> TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info', 
>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set', 
>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>>
>> Am I executing in wrong way or this is some bug ?
>>
>> Regards,
>> Stuti Awasthi
>> HCL Comnet Systems and Services Ltd
>> F-8/9 Basement, Sec-3,Noida.
>>
>>
>> ________________________________
>> ::DISCLAIMER::
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its 
>> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure, 
>> modification, distribution and / or publication of this message 
>> without the prior written consent of the author of this e-mail is 
>> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>

Re: Facing Issues with RowCounter

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Ah! Took me a moment to figure it out, it's:

https://issues.apache.org/jira/browse/HBASE-4295 "rowcounter does not
return the correct number of rows in certain circumstances"

What made me think about it is that your counters do say that rows
were taken into input, but none counted because the values are empty.
That was the problem in 4295.

The patch is currently only in the tip of the 0.90 branch, so unless
you patch it yourself you'll have to wait for 0.90.5 (which may or may
not get released, depends if someone wants to do it).

J-D

On Wed, Nov 16, 2011 at 9:27 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi JD,
>
> Table 'Keyword' contains 'Set' column family with 7 rows. Here is the output of scan :
>
> hbase(main):001:0> scan 'Keyword',{COLUMNS=>['Set']}
> ROW                                COLUMN+CELL
>  Apache                            column=Set:Fuse, timestamp=1321506922206, value=
>  Apache                            column=Set:Hadoop, timestamp=1321506922206, value=
>  Apache                            column=Set:Hive, timestamp=1321506922206, value=
>  Apache                            column=Set:MySql, timestamp=1321506922206, value=
>  Apache                            column=Set:PHP, timestamp=1321506922206, value=
>  Fuse                                column=Set:Apache, timestamp=1321506922206, value=
>  Fuse                                column=Set:Hdfs, timestamp=1321506922209, value=
>  Hadoop                            column=Set:Apache, timestamp=1321506922209, value=
>  Hadoop                            column=Set:Hive, timestamp=1321506922212, value=
>  Hdfs                              column=Set:Fuse, timestamp=1321506922212, value=
>  Hive                              column=Set:Apache, timestamp=1321506922212, value=
>  Hive                              column=Set:Hadoop, timestamp=1321506922214, value=
>  MySql                             column=Set:Apache, timestamp=1321506922214, value=
>  MySql                             column=Set:PHP, timestamp=1321506922216, value=
>  PHP                               column=Set:Apache, timestamp=1321506922216, value=
>  PHP                               column=Set:MySql, timestamp=1321506922218, value=
> 7 row(s) in 0.4120 seconds
>
> This output is not shown in RowCounter MR job.
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Wednesday, November 16, 2011 11:09 PM
> To: user@hbase.apache.org
> Subject: Re: Facing Issues with RowCounter
>
> What I can decrypt from those outputs is that you have a total of 7 rows, and none of them have data in the "Set" column family. Is it the case or not? Without more info from you, it's hard to tell.
>
> J-D
>
> On Tue, Nov 15, 2011 at 11:41 PM, Stuti Awasthi <st...@hcl.com> wrote:
>> Hi,
>> I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.
>>
>> Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase
>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Output :
>> 11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>> 11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
>> 11/11/16 13:04:32 INFO mapred.JobClient:
>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>> 11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
>> 11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
>> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
>> 11/11/16 13:04:32 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=2411923
>> 11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
>> 11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0
>>
>> Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase
>> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set
>>
>> Output :
>> 11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
>> 11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
>> 11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
>> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
>> 11/11/16 13:05:34 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=2411939
>> 11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
>> 11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0
>>
>> Table Describe command Output is :
>> TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info',
>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =>
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set',
>> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =>
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>>
>> Am I executing in wrong way or this is some bug ?
>>
>> Regards,
>> Stuti Awasthi
>> HCL Comnet Systems and Services Ltd
>> F-8/9 Basement, Sec-3,Noida.
>>
>>
>> ________________________________
>> ::DISCLAIMER::
>> ----------------------------------------------------------------------
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its
>> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure,
>> modification, distribution and / or publication of this message
>> without the prior written consent of the author of this e-mail is
>> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>>
>> ----------------------------------------------------------------------
>> -------------------------------------------------
>>
>

RE: Facing Issues with RowCounter

Posted by Stuti Awasthi <st...@hcl.com>.
Hi JD,

Table 'Keyword' contains 'Set' column family with 7 rows. Here is the output of scan :

hbase(main):001:0> scan 'Keyword',{COLUMNS=>['Set']}
ROW                                COLUMN+CELL
 Apache                            column=Set:Fuse, timestamp=1321506922206, value=
 Apache                            column=Set:Hadoop, timestamp=1321506922206, value=
 Apache                            column=Set:Hive, timestamp=1321506922206, value=
 Apache                            column=Set:MySql, timestamp=1321506922206, value=
 Apache                            column=Set:PHP, timestamp=1321506922206, value=
 Fuse                                column=Set:Apache, timestamp=1321506922206, value=
 Fuse                                column=Set:Hdfs, timestamp=1321506922209, value=
 Hadoop                            column=Set:Apache, timestamp=1321506922209, value=
 Hadoop                            column=Set:Hive, timestamp=1321506922212, value=
 Hdfs                              column=Set:Fuse, timestamp=1321506922212, value=
 Hive                              column=Set:Apache, timestamp=1321506922212, value=
 Hive                              column=Set:Hadoop, timestamp=1321506922214, value=
 MySql                             column=Set:Apache, timestamp=1321506922214, value=
 MySql                             column=Set:PHP, timestamp=1321506922216, value=
 PHP                               column=Set:Apache, timestamp=1321506922216, value=
 PHP                               column=Set:MySql, timestamp=1321506922218, value=
7 row(s) in 0.4120 seconds

This output is not shown in RowCounter MR job.

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Wednesday, November 16, 2011 11:09 PM
To: user@hbase.apache.org
Subject: Re: Facing Issues with RowCounter

What I can decrypt from those outputs is that you have a total of 7 rows, and none of them have data in the "Set" column family. Is it the case or not? Without more info from you, it's hard to tell.

J-D

On Tue, Nov 15, 2011 at 11:41 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi,
> I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.
>
> Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase 
> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Output :
> 11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
> 11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
> 11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
> 11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
> 11/11/16 13:04:32 INFO mapred.JobClient:   
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> 11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
> 11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
> 11/11/16 13:04:32 INFO mapred.JobClient:     
> FILE_BYTES_WRITTEN=2411923
> 11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
> 11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
> 11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
> 11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0
>
> Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase 
> org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set
>
> Output :
> 11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
> 11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
> 11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
> 11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
> 11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
> 11/11/16 13:05:34 INFO mapred.JobClient:     
> FILE_BYTES_WRITTEN=2411939
> 11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
> 11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
> 11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
> 11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0
>
> Table Describe command Output is :
> TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info', 
> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set', 
> BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>
> Am I executing in wrong way or this is some bug ?
>
> Regards,
> Stuti Awasthi
> HCL Comnet Systems and Services Ltd
> F-8/9 Basement, Sec-3,Noida.
>
>
> ________________________________
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its 
> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>

Re: Facing Issues with RowCounter

Posted by Jean-Daniel Cryans <jd...@apache.org>.
What I can decrypt from those outputs is that you have a total of 7
rows, and none of them have data in the "Set" column family. Is it the
case or not? Without more info from you, it's hard to tell.

J-D

On Tue, Nov 15, 2011 at 11:41 PM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi,
> I tried to use MR RowCounter to count the rows of a table with specific column family. But it is not displaying correct result.
>
> Command (Only Table Name as argument ):  Hbase/hbase-0.90.3/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter Keyword
> Output :
> 11/11/16 13:04:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
> 11/11/16 13:04:32 INFO mapred.JobClient:  map 100% reduce 0%
> 11/11/16 13:04:32 INFO mapred.JobClient: Job complete: job_local_0001
> 11/11/16 13:04:32 INFO mapred.JobClient: Counters: 6
> 11/11/16 13:04:32 INFO mapred.JobClient:   org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> 11/11/16 13:04:32 INFO mapred.JobClient:     ROWS=7
> 11/11/16 13:04:32 INFO mapred.JobClient:   FileSystemCounters
> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_READ=2373099
> 11/11/16 13:04:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2411923
> 11/11/16 13:04:32 INFO mapred.JobClient:   Map-Reduce Framework
> 11/11/16 13:04:32 INFO mapred.JobClient:     Map input records=7
> 11/11/16 13:04:32 INFO mapred.JobClient:     Spilled Records=0
> 11/11/16 13:04:32 INFO mapred.JobClient:     Map output records=0
>
> Command (TableName, ColumnFamily): Hbase/hbase-0.90.3/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter Keyword Set
>
> Output :
> 11/11/16 13:05:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
> 11/11/16 13:05:34 INFO mapred.JobClient:  map 100% reduce 0%
> 11/11/16 13:05:34 INFO mapred.JobClient: Job complete: job_local_0001
> 11/11/16 13:05:34 INFO mapred.JobClient: Counters: 5
> 11/11/16 13:05:34 INFO mapred.JobClient:   FileSystemCounters
> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_READ=2373107
> 11/11/16 13:05:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2411939
> 11/11/16 13:05:34 INFO mapred.JobClient:   Map-Reduce Framework
> 11/11/16 13:05:34 INFO mapred.JobClient:     Map input records=7
> 11/11/16 13:05:34 INFO mapred.JobClient:     Spilled Records=0
> 11/11/16 13:05:34 INFO mapred.JobClient:     Map output records=0
>
> Table Describe command Output is :
> TABLE => {{NAME => 'Keyword', FAMILIES => [{NAME => 'Info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'Set', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>
> Am I executing in wrong way or this is some bug ?
>
> Regards,
> Stuti Awasthi
> HCL Comnet Systems and Services Ltd
> F-8/9 Basement, Sec-3,Noida.
>
>
> ________________________________
> ::DISCLAIMER::
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------
>