You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by llpind <so...@hotmail.com> on 2009/06/10 00:12:19 UTC

Help with Map/Reduce program

Hi again,

I need some help with a map/reduce program I have which copies data from one
table to another.  What I would like to do is iterate through an entire
HBase table, and for a given row key and column family count the number of
records.  So the output table will have a single column family named 'count'
(e.g. entry would look something like 'rowKey1', 'count:count_for_rowkey1',
'534', where the rowkey could be the same as input table ).

here is my first attempt:

CONF:
=========================================================================

		c.setInputFormat(TableInputFormat.class);
		c.setOutputFormat(TableOutputFormat.class);
		
	    TableMapReduceUtil.initTableMapJob("inputTableName", "colFam1:*",
MapperClass.class,
	    	      ImmutableBytesWritable.class, RowResult.class, c);
	    
	    TableMapReduceUtil.initTableReduceJob("outputTableName",
ReducerClass.class, c );

MapperClass:
=====================================================================

		@Override
		public void map(
				ImmutableBytesWritable key, 
				RowResult row,
				OutputCollector<ImmutableBytesWritable, RowResult> collector,
				Reporter reporter) throws IOException {
			

			reporter.incrCounter(Counters.ROWS, 1);
			collector.collect(key, row);
		}

ReducerClass:================================================================

		@Override
		public void reduce(ImmutableBytesWritable k,
				Iterator<RowResult> v,
				OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
				Reporter r) throws IOException {

			while (v.hasNext()){
				BatchUpdate bu = new BatchUpdate(k.get());
				while (v.hasNext()){
					RowResult row = v.next();
					bu.put(Bytes.toBytes("count:rowToCountName"),
Bytes.toBytes(row.size()));
				}
				c.collect(k, bu);
			}
                             }

========================================================================

It runs the map/reduce, but I get nothing in my output table.  

Thanks.

llpind
-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23952252.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

Thanks Ryan.  You are right it is very much like word count.  Here is what I
have:

private final static IntWritable one = new IntWritable(1);

MAPPER
=====================================================
		@Override
		public void map(
				ImmutableBytesWritable key, 
				RowResult row,
				OutputCollector<ImmutableBytesWritable, IntWritable> collector,
				Reporter r) throws IOException {
			//extract entity trims TYPE|VALUE|ID to just TYPE|VALUE
			collector.collect(new ImmutableBytesWritable(extractEntity(key.get())),
one);
		}


REDUCER
=====================================================

	public static class Reducer extends MapReduceBase implements
			TableReduce<ImmutableBytesWritable, IntWritable> {
		
		@Override
		public void reduce(ImmutableBytesWritable k,
				Iterator<IntWritable> v,
				OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
				Reporter r) throws IOException {
			
			BatchUpdate bu = new BatchUpdate(k.get());
	        int sum = 0;
	        while (v.hasNext()) {
	          sum += v.next().get();
	        }
	        bu.put("count:"+sum, String.valueOf(sum).getBytes());
	        c.collect(k, bu);
         }

=====================================================

	    TableMapReduceUtil.initTableMapJob(inputTableName, "colFam:",
Mapper.class,
	    	      ImmutableBytesWritable.class, IntWritable.class, c);
	    
	    TableMapReduceUtil.initTableReduceJob("output_count_table",
Reducer.class, c );


======================================================

The input table has just one column family, which isn't even necessary.  The
output table also has just one column family 'count'.  The goal is to put a
single entry in the output table along with the occurrence count.  So the
input table has row keys like TYPE|VALUE|1, TYPE|VALUE|2, etc (with possibly
millions), and the output table should have row key TYPE|VALUE, and value 2.  
The problem I'm having is I don't get the correct count, it's close but not
correct.  Is there something I'm doing incorrectly above? 

I'm open to any suggestions. Thanks.


Ryan Rawson wrote:
> 
> This looks like a variant of word count.
> 
> In the map you filter out the rows you are interested in, and emit
> "ELECTRONICS|TV" as the key and just about anything as the value.  The in
> the reduce you count how many values there are, then do the batch update
> as
> you have below.
> 
> 
> 
> On Fri, Jun 12, 2009 at 10:04 AM, llpind <so...@hotmail.com> wrote:
> 
>>
>> I believe my map is collecting per row correcly, but reduce doesn't seem
>> to
>> be doing anything:
>> =============================================================
>>
>>                private RowResult previousRow = null;  //keep previous row
>>                private int counter = 0;  //counter for adding up like
>> TYPE|VALUE
>>                 @Override
>>                public void reduce(ImmutableBytesWritable k,
>>                                Iterator<RowResult> v,
>>                                OutputCollector<ImmutableBytesWritable,
>> BatchUpdate> c,
>>                                Reporter r) throws IOException {
>>
>>                         //keep counter for equal entities. row takes
>> TYPE|VALUE|LINKID form
>>                        while (v.hasNext()){
>>                                RowResult currentRow = v.next();
>>                                if (previousRow == null){
>>                                        previousRow = currentRow;
>>                                }
>>                                if
>> (extractEntity(currentRow).equals(extractEntity(previousRow))){
>>                                        ++counter;
>>                                }else{
>>                                        //commit previous row size, set
>> previous to current & reset counter
>>                                        BatchUpdate bu = new
>> BatchUpdate(extractEntity(previousRow));
>>                                        bu.put("count:"+counter,
>> String.valueOf(counter).getBytes());
>>                                        c.collect(new
>> ImmutableBytesWritable(previousRow.getRow()), bu);
>>                                        previousRow = currentRow;
>>                                        counter = 0;
>>
>>                                }
>>
>>                        }
>>
>> ==============================================
>> What am I doing wrong?
>>
>> The extract is simiply getting the TYPE|VALUE only.
>>
>> what excatly do I have in the Iterator<RowResult>  at this point?
>>
>> Thanks
>>
>> llpind wrote:
>> >
>> > If i have a tall table, what is returned in the reduce?   I'm still
>> > confused as to how things map up.
>> >
>> > for example assume I have ELECTRONICS|TV|ID2343 as the row key. There
>> are
>> > millions of these (ELECTRONICS|TV|ID234324, along with other products).
>> > I'd like to count the total # of IDs for all TVs.  How do I do this
>> with
>> > map/reduce?  I tried a few things, but not able to get it working.
>> >
>> >
>> >
>> > Ryan Rawson wrote:
>> >>
>> >> Also remember you might be able to convert to a tall table. Row keys
>> can
>> >> be
>> >> compound and you can do partial left matches on them. Eg:
>> >>
>> >> Userid:timestamp:eventid
>> >>
>> >> now you have a tall table. Do prefix matches on the userid you want
>> and
>> >> you
>> >> get results in chronological order.
>> >>
>> >> You can build equivalent indexes in hbase as in sql. You may find a
>> >> design
>> >> like this alieviates the need for extremely wide rows.
>> >>
>> >> Good luck!
>> >>
>> >> On Jun 11, 2009 11:44 AM, "Billy Pearson" <sa...@pearsonwholesale.com>
>> >> wrote:
>> >>
>> >> That might be a good idea but you might be able to redesign you layout
>> of
>> >> the table
>> >> using a different key then the current one worth barnstorming.
>> >>
>> >> Billy
>> >>
>> >>
>> >>
>> >> "llpind" <so...@hotmail.com> wrote in message
>> >> news:23975432.post@talk.nabble.com...
>> >>
>> >> Sorry I forgot to mention the overflow then overflows into new row
>> keys
>> >> per
>> >> 10,000 column entries ...
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002766.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24042336.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by Ryan Rawson <ry...@gmail.com>.

This looks like a variant of word count.

In the map you filter out the rows you are interested in, and emit
"ELECTRONICS|TV" as the key and just about anything as the value.  The in
the reduce you count how many values there are, then do the batch update as
you have below.



On Fri, Jun 12, 2009 at 10:04 AM, llpind <so...@hotmail.com> wrote:

>
> I believe my map is collecting per row correcly, but reduce doesn't seem to
> be doing anything:
> =============================================================
>
>                private RowResult previousRow = null;  //keep previous row
>                private int counter = 0;  //counter for adding up like
> TYPE|VALUE
>                 @Override
>                public void reduce(ImmutableBytesWritable k,
>                                Iterator<RowResult> v,
>                                OutputCollector<ImmutableBytesWritable,
> BatchUpdate> c,
>                                Reporter r) throws IOException {
>
>                         //keep counter for equal entities. row takes
> TYPE|VALUE|LINKID form
>                        while (v.hasNext()){
>                                RowResult currentRow = v.next();
>                                if (previousRow == null){
>                                        previousRow = currentRow;
>                                }
>                                if
> (extractEntity(currentRow).equals(extractEntity(previousRow))){
>                                        ++counter;
>                                }else{
>                                        //commit previous row size, set
> previous to current & reset counter
>                                        BatchUpdate bu = new
> BatchUpdate(extractEntity(previousRow));
>                                        bu.put("count:"+counter,
> String.valueOf(counter).getBytes());
>                                        c.collect(new
> ImmutableBytesWritable(previousRow.getRow()), bu);
>                                        previousRow = currentRow;
>                                        counter = 0;
>
>                                }
>
>                        }
>
> ==============================================
> What am I doing wrong?
>
> The extract is simiply getting the TYPE|VALUE only.
>
> what excatly do I have in the Iterator<RowResult>  at this point?
>
> Thanks
>
> llpind wrote:
> >
> > If i have a tall table, what is returned in the reduce?   I'm still
> > confused as to how things map up.
> >
> > for example assume I have ELECTRONICS|TV|ID2343 as the row key. There are
> > millions of these (ELECTRONICS|TV|ID234324, along with other products).
> > I'd like to count the total # of IDs for all TVs.  How do I do this with
> > map/reduce?  I tried a few things, but not able to get it working.
> >
> >
> >
> > Ryan Rawson wrote:
> >>
> >> Also remember you might be able to convert to a tall table. Row keys can
> >> be
> >> compound and you can do partial left matches on them. Eg:
> >>
> >> Userid:timestamp:eventid
> >>
> >> now you have a tall table. Do prefix matches on the userid you want and
> >> you
> >> get results in chronological order.
> >>
> >> You can build equivalent indexes in hbase as in sql. You may find a
> >> design
> >> like this alieviates the need for extremely wide rows.
> >>
> >> Good luck!
> >>
> >> On Jun 11, 2009 11:44 AM, "Billy Pearson" <sa...@pearsonwholesale.com>
> >> wrote:
> >>
> >> That might be a good idea but you might be able to redesign you layout
> of
> >> the table
> >> using a different key then the current one worth barnstorming.
> >>
> >> Billy
> >>
> >>
> >>
> >> "llpind" <so...@hotmail.com> wrote in message
> >> news:23975432.post@talk.nabble.com...
> >>
> >> Sorry I forgot to mention the overflow then overflows into new row keys
> >> per
> >> 10,000 column entries ...
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002766.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

I believe my map is collecting per row correcly, but reduce doesn't seem to
be doing anything:
=============================================================

		private RowResult previousRow = null;  //keep previous row
		private int counter = 0;  //counter for adding up like TYPE|VALUE
		@Override
		public void reduce(ImmutableBytesWritable k,
				Iterator<RowResult> v,
				OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
				Reporter r) throws IOException {
			
			//keep counter for equal entities. row takes TYPE|VALUE|LINKID form
			while (v.hasNext()){
				RowResult currentRow = v.next();
				if (previousRow == null){
					previousRow = currentRow;
				}
				if (extractEntity(currentRow).equals(extractEntity(previousRow))){
					++counter;
				}else{
					//commit previous row size, set previous to current & reset counter
					BatchUpdate bu = new BatchUpdate(extractEntity(previousRow));
					bu.put("count:"+counter, String.valueOf(counter).getBytes());
					c.collect(new ImmutableBytesWritable(previousRow.getRow()), bu);
					previousRow = currentRow;
					counter = 0;
					
				}
				
			}

==============================================
What am I doing wrong?

The extract is simiply getting the TYPE|VALUE only.  

what excatly do I have in the Iterator<RowResult>  at this point?  

Thanks

llpind wrote:
> 
> If i have a tall table, what is returned in the reduce?   I'm still
> confused as to how things map up.
> 
> for example assume I have ELECTRONICS|TV|ID2343 as the row key. There are
> millions of these (ELECTRONICS|TV|ID234324, along with other products). 
> I'd like to count the total # of IDs for all TVs.  How do I do this with
> map/reduce?  I tried a few things, but not able to get it working.
> 
> 
> 
> Ryan Rawson wrote:
>> 
>> Also remember you might be able to convert to a tall table. Row keys can
>> be
>> compound and you can do partial left matches on them. Eg:
>> 
>> Userid:timestamp:eventid
>> 
>> now you have a tall table. Do prefix matches on the userid you want and
>> you
>> get results in chronological order.
>> 
>> You can build equivalent indexes in hbase as in sql. You may find a
>> design
>> like this alieviates the need for extremely wide rows.
>> 
>> Good luck!
>> 
>> On Jun 11, 2009 11:44 AM, "Billy Pearson" <sa...@pearsonwholesale.com>
>> wrote:
>> 
>> That might be a good idea but you might be able to redesign you layout of
>> the table
>> using a different key then the current one worth barnstorming.
>> 
>> Billy
>> 
>> 
>> 
>> "llpind" <so...@hotmail.com> wrote in message
>> news:23975432.post@talk.nabble.com...
>> 
>> Sorry I forgot to mention the overflow then overflows into new row keys
>> per
>> 10,000 column entries ...
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002766.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

If i have a tall table, what is returned in the reduce?   I'm still confused
as to how things map up.

for example assume I have ELECTRONICS|TV|ID2343 as the row key. There are
millions of these (ELECTRONICS|TV|ID234324, along with other products).  I'd
like to count the total # of IDs for all TVs.  How do I do this with
map/reduce?  I tried a few things, but not able to get it working.



Ryan Rawson wrote:
> 
> Also remember you might be able to convert to a tall table. Row keys can
> be
> compound and you can do partial left matches on them. Eg:
> 
> Userid:timestamp:eventid
> 
> now you have a tall table. Do prefix matches on the userid you want and
> you
> get results in chronological order.
> 
> You can build equivalent indexes in hbase as in sql. You may find a design
> like this alieviates the need for extremely wide rows.
> 
> Good luck!
> 
> On Jun 11, 2009 11:44 AM, "Billy Pearson" <sa...@pearsonwholesale.com>
> wrote:
> 
> That might be a good idea but you might be able to redesign you layout of
> the table
> using a different key then the current one worth barnstorming.
> 
> Billy
> 
> 
> 
> "llpind" <so...@hotmail.com> wrote in message
> news:23975432.post@talk.nabble.com...
> 
> Sorry I forgot to mention the overflow then overflows into new row keys
> per
> 10,000 column entries ...
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p24002604.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

That’s funny; we just got out of a meeting discussing this.  Yes, we can move
the columns into the row key making a really tall table.  I hope this will
perform better than large # of columns.  We will have to find a good
delimiter which will work generically across the data set.


Ryan Rawson wrote:
> 
> Also remember you might be able to convert to a tall table. Row keys can
> be
> compound and you can do partial left matches on them. Eg:
> 
> Userid:timestamp:eventid
> 
> now you have a tall table. Do prefix matches on the userid you want and
> you
> get results in chronological order.
> 
> You can build equivalent indexes in hbase as in sql. You may find a design
> like this alieviates the need for extremely wide rows.
> 
> Good luck!
> 
> On Jun 11, 2009 11:44 AM, "Billy Pearson" <sa...@pearsonwholesale.com>
> wrote:
> 
> That might be a good idea but you might be able to redesign you layout of
> the table
> using a different key then the current one worth barnstorming.
> 
> Billy
> 
> 
> 
> "llpind" <so...@hotmail.com> wrote in message
> news:23975432.post@talk.nabble.com...
> 
> Sorry I forgot to mention the overflow then overflows into new row keys
> per
> 10,000 column entries ...
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23989319.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by Ryan Rawson <ry...@gmail.com>.

Also remember you might be able to convert to a tall table. Row keys can be
compound and you can do partial left matches on them. Eg:

Userid:timestamp:eventid

now you have a tall table. Do prefix matches on the userid you want and you
get results in chronological order.

You can build equivalent indexes in hbase as in sql. You may find a design
like this alieviates the need for extremely wide rows.

Good luck!

On Jun 11, 2009 11:44 AM, "Billy Pearson" <sa...@pearsonwholesale.com>
wrote:

That might be a good idea but you might be able to redesign you layout of
the table
using a different key then the current one worth barnstorming.

Billy



"llpind" <so...@hotmail.com> wrote in message
news:23975432.post@talk.nabble.com...

Sorry I forgot to mention the overflow then overflows into new row keys per
10,000 column entries ...

Re: Help with Map/Reduce program

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

That might be a good idea but you might be able to redesign you layout of 
the table
using a different key then the current one worth barnstorming.

Billy



"llpind" <so...@hotmail.com> wrote in 
message news:23975432.post@talk.nabble.com...

Sorry I forgot to mention the overflow then overflows into new row keys per
10,000 column entries (or some other split number).



llpind wrote:
>
>
> When is the plan for releasing .20?  This particular issue is really
> important to us.
>
> Stack, I also have another question: The problem we are trying to solve
> doesn't really need the extra layer present in HBase (BigTable) structure
> (RowResult holds row key and a HashMap of column name, value). What we
> really need is a row key which simply holds a set of values.  Essentially
> this is a many-to-many.  I wanted your thoughts on how we can go about
> solving this problem (we can start another post for this if you’d like).
> Is this something HBase can solve, or something that could potentially be
> a HBase fork?  Right now we are still in test mode, and only having to
> deal with millions of columns, but in production (if the company sticks
> with HBase) the columns could be in the billions.  One idea we came up
> with is to have an overflow table… e.g.
>
> For a given row key we list the first 10,000 columns (values in our case),
> and after that we create a column with an overflow id pointing an overflow
> table which is keyed on this id.
>
> This appears it may work, but isn’t the most elegant solution.  I’d
> appreciate input from anyone on this issue.   Please, let me know if you
> need me explain our problem in more detail.
>
>
>
> stack-3 wrote:
>>
>> On Wed, Jun 10, 2009 at 4:52 PM, llpind 
>> <so...@hotmail.com> wrote:
>>
>>>
>>> Thanks.  I think the problem is I have potentially millions of columns.
>>>
>>
>>> where a given RowResult can hold millions of columns to values.   Thats
>>> why
>>> Map/Reduce is having problems as well (Java Heap exception).  I've upped
>>> mapred.child.java.opts, but problem presists.
>>>
>>
>> See also HBASE-867: https://issues.apache.org/jira/browse/HBASE-867
>> St.Ack
>>
>>
>
>

-- 
View this message in context: 
http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23975432.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

Sorry I forgot to mention the overflow then overflows into new row keys per
10,000 column entries (or some other split number).



llpind wrote:
> 
> 
> When is the plan for releasing .20?  This particular issue is really
> important to us. 
> 
> Stack, I also have another question: The problem we are trying to solve
> doesn't really need the extra layer present in HBase (BigTable) structure
> (RowResult holds row key and a HashMap of column name, value). What we
> really need is a row key which simply holds a set of values.  Essentially
> this is a many-to-many.  I wanted your thoughts on how we can go about
> solving this problem (we can start another post for this if you’d like).
> Is this something HBase can solve, or something that could potentially be
> a HBase fork?  Right now we are still in test mode, and only having to
> deal with millions of columns, but in production (if the company sticks
> with HBase) the columns could be in the billions.  One idea we came up
> with is to have an overflow table… e.g.
> 
> For a given row key we list the first 10,000 columns (values in our case),
> and after that we create a column with an overflow id pointing an overflow
> table which is keyed on this id.
> 
> This appears it may work, but isn’t the most elegant solution.  I’d
> appreciate input from anyone on this issue.   Please, let me know if you
> need me explain our problem in more detail. 
> 
> 
> 
> stack-3 wrote:
>> 
>> On Wed, Jun 10, 2009 at 4:52 PM, llpind <so...@hotmail.com> wrote:
>> 
>>>
>>> Thanks.  I think the problem is I have potentially millions of columns.
>>>
>> 
>>> where a given RowResult can hold millions of columns to values.   Thats
>>> why
>>> Map/Reduce is having problems as well (Java Heap exception).  I've upped
>>> mapred.child.java.opts, but problem presists.
>>>
>> 
>> See also HBASE-867: https://issues.apache.org/jira/browse/HBASE-867
>> St.Ack
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23975432.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

When is the plan for releasing .20?  This particular issue is really
important to us. 

Stack, I also have another question: The problem we are trying to solve
doesn't really need the extra layer present in HBase (BigTable) structure
(RowResult holds row key and a HashMap of column name, value). What we
really need is a row key which simply holds a set of values.  Essentially
this is a many-to-many.  I wanted your thoughts on how we can go about
solving this problem (we can start another post for this if you’d like). Is
this something HBase can solve, or something that could potentially be a
HBase fork?  Right now we are still in test mode, and only having to deal
with millions of columns, but in production (if the company sticks with
HBase) the columns could be in the billions.  One idea we came up with is to
have an overflow table… e.g.

For a given row key we list the first 10,000 columns (values in our case),
and after that we create a column with an overflow id pointing an overflow
table which is keyed on this id.

This appears it may work, but isn’t the most elegant solution.  I’d
appreciate input from anyone on this issue.   Please, let me know if you
need me explain our problem in more detail. 

stack-3 wrote:
> 
> On Wed, Jun 10, 2009 at 4:52 PM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Thanks.  I think the problem is I have potentially millions of columns.
>>
> 
>> where a given RowResult can hold millions of columns to values.   Thats
>> why
>> Map/Reduce is having problems as well (Java Heap exception).  I've upped
>> mapred.child.java.opts, but problem presists.
>>
> 
> See also HBASE-867: https://issues.apache.org/jira/browse/HBASE-867
> St.Ack
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23975405.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by stack <st...@duboce.net>.

On Wed, Jun 10, 2009 at 4:52 PM, llpind <so...@hotmail.com> wrote:

>
> Thanks.  I think the problem is I have potentially millions of columns.
>

> where a given RowResult can hold millions of columns to values.   Thats why
> Map/Reduce is having problems as well (Java Heap exception).  I've upped
> mapred.child.java.opts, but problem presists.
>

See also HBASE-867: https://issues.apache.org/jira/browse/HBASE-867
St.Ack

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

Thanks.  I think the problem is I have potentially millions of columns.

where a given RowResult can hold millions of columns to values.   Thats why
Map/Reduce is having problems as well (Java Heap exception).  I've upped
mapred.child.java.opts, but problem presists.


Ryan Rawson wrote:
> 
> Hey,
> 
> A scanner's lease expires in 60 seconds.  I'm not sure what version you
> are
> using, but try:
> table.setScannerCaching(1);
> 
> This way you won't retrieve 60 rows that each take 1-2 seconds to process.
> 
> This is the new default value in 0.20, but I don't know if it ended up in
> 0.19.x anywhere.
> 
> 
> On Wed, Jun 10, 2009 at 2:14 PM, llpind <so...@hotmail.com> wrote:
> 
>>
>> Okay, I think I got it figured out.
>>
>> although when scanning large row keys I do get the following exception:
>>
>> NativeException: java.lang.RuntimeException:
>> org.apache.hadoop.hbase.UnknownScannerException:
>> org.apache.hadoop.hbase.UnknownScannerException: -4424757523660246367
>>        at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.close(HRegionServer.java:1745)
>>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>>        at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>>        at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
>>
>>        from org/apache/hadoop/hbase/client/HTable.java:1741:in `hasNext'
>>        from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>>        from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>>        from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>>        from java/lang/reflect/Method.java:597:in `invoke'
>>        from org/jruby/javasupport/JavaMethod.java:298:in
>> `invokeWithExceptionHandling'
>>        from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>>        from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in
>> `call'
>>        from org/jruby/runtime/callsite/CachingCallSite.java:73:in `call'
>>        from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>>        from org/jruby/ast/WhileNode.java:124:in `interpret'
>>        from org/jruby/ast/NewlineNode.java:101:in `interpret'
>>        from org/jruby/ast/BlockNode.java:68:in `interpret'
>>        from org/jruby/internal/runtime/methods/DefaultMethod.java:156:in
>> `interpretedCall'
>>        from org/jruby/internal/runtime/methods/DefaultMethod.java:133:in
>> `call'
>>        from org/jruby/internal/runtime/methods/DefaultMethod.java:246:in
>> `call'
>> ... 108 levels...
>>        from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>> `call'
>>        from org/jruby/internal/runtime/methods/CompiledMethod.java:216:in
>> `call'
>>        from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>> `call'
>>        from org/jruby/runtime/callsite/CachingCallSite.java:260:in
>> `cacheAndCall'
>>        from org/jruby/runtime/callsite/CachingCallSite.java:75:in `call'
>>        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:441:in
>> `__file__'
>>        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:-1:in
>> `__file__'
>>        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>>        from org/jruby/Ruby.java:564:in `runScript'
>>        from org/jruby/Ruby.java:467:in `runNormally'
>>        from org/jruby/Ruby.java:340:in `runFromMain'
>>        from org/jruby/Main.java:214:in `run'
>>        from org/jruby/Main.java:100:in `run'
>>        from org/jruby/Main.java:84:in `main'
>>        from /home/hadoop/hbase193/bin/../bin/hirb.rb:346:in `scan'
>>
>>
>> ===================================================
>>
>> Is there an easy way around this problem?
>>
>>
>>
>>
>> Billy Pearson-2 wrote:
>> >
>> > Yes that's what scanners are good for they will return all the
>> > columns:lables combos for a row
>> > What does the MR job stats say for rows processed for the maps and
>> > reduces?
>> >
>> > Billy Pearson
>> >
>> >
>> >
>> > "llpind" <so...@hotmail.com> wrote in
>> > message news:23967196.post@talk.nabble.com...
>> >>
>> >> also,
>> >>
>> >> I think what we want is a way to wildcard everything after colFam1:
>> >> (e.g.
>> >> colFam1:*).  Is there a way to do this in HBase?
>> >>
>> >> This is assuming we dont know the column name, we want them all.
>> >>
>> >>
>> >> llpind wrote:
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Yea I've got that colFam for sure in the HBase table:
>> >>>
>> >>> {NAME => 'tableA', FAMILIES => [{NAME => 'colFam1', VERSIONS => '3',
>> >>> COMPRESSION => 'NONE', LENGTH => '2147483647',
>> >>>  TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME =>
>> >>> 'colFam2', VERSIONS => '3', COMPRESSION =>
>> >>>  'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
>> >>> BLOCKCACHE => 'false'}]}
>> >>>
>> >>>
>> >>> I've been trying to play with rowcounter, and not having much luck
>> >>> either.
>> >>>
>> >>> I run the command:
>> >>> hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
>> >>> /home/hadoop/dev/rowcounter7 tableA colFam1:
>> >>>
>> >>>
>> >>> The map/reduce finishes just like it does with my own program, but
>> with
>> >>> all part files empty in /home/hadoop/dev/rowcounter7.
>> >>>
>> >>> Any Ideas?
>> >>>
>> >>>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23967196.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23971190.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23973170.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by Ryan Rawson <ry...@gmail.com>.

Hey,

A scanner's lease expires in 60 seconds.  I'm not sure what version you are
using, but try:
table.setScannerCaching(1);

This way you won't retrieve 60 rows that each take 1-2 seconds to process.

This is the new default value in 0.20, but I don't know if it ended up in
0.19.x anywhere.


On Wed, Jun 10, 2009 at 2:14 PM, llpind <so...@hotmail.com> wrote:

>
> Okay, I think I got it figured out.
>
> although when scanning large row keys I do get the following exception:
>
> NativeException: java.lang.RuntimeException:
> org.apache.hadoop.hbase.UnknownScannerException:
> org.apache.hadoop.hbase.UnknownScannerException: -4424757523660246367
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.close(HRegionServer.java:1745)
>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
>
>        from org/apache/hadoop/hbase/client/HTable.java:1741:in `hasNext'
>        from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>        from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>        from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>        from java/lang/reflect/Method.java:597:in `invoke'
>        from org/jruby/javasupport/JavaMethod.java:298:in
> `invokeWithExceptionHandling'
>        from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>        from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>        from org/jruby/runtime/callsite/CachingCallSite.java:73:in `call'
>        from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>        from org/jruby/ast/WhileNode.java:124:in `interpret'
>        from org/jruby/ast/NewlineNode.java:101:in `interpret'
>        from org/jruby/ast/BlockNode.java:68:in `interpret'
>        from org/jruby/internal/runtime/methods/DefaultMethod.java:156:in
> `interpretedCall'
>        from org/jruby/internal/runtime/methods/DefaultMethod.java:133:in
> `call'
>        from org/jruby/internal/runtime/methods/DefaultMethod.java:246:in
> `call'
> ... 108 levels...
>        from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
> `call'
>        from org/jruby/internal/runtime/methods/CompiledMethod.java:216:in
> `call'
>        from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
> `call'
>        from org/jruby/runtime/callsite/CachingCallSite.java:260:in
> `cacheAndCall'
>        from org/jruby/runtime/callsite/CachingCallSite.java:75:in `call'
>        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:441:in
> `__file__'
>        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:-1:in
> `__file__'
>        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>        from org/jruby/Ruby.java:564:in `runScript'
>        from org/jruby/Ruby.java:467:in `runNormally'
>        from org/jruby/Ruby.java:340:in `runFromMain'
>        from org/jruby/Main.java:214:in `run'
>        from org/jruby/Main.java:100:in `run'
>        from org/jruby/Main.java:84:in `main'
>        from /home/hadoop/hbase193/bin/../bin/hirb.rb:346:in `scan'
>
>
> ===================================================
>
> Is there an easy way around this problem?
>
>
>
>
> Billy Pearson-2 wrote:
> >
> > Yes that's what scanners are good for they will return all the
> > columns:lables combos for a row
> > What does the MR job stats say for rows processed for the maps and
> > reduces?
> >
> > Billy Pearson
> >
> >
> >
> > "llpind" <so...@hotmail.com> wrote in
> > message news:23967196.post@talk.nabble.com...
> >>
> >> also,
> >>
> >> I think what we want is a way to wildcard everything after colFam1:
> >> (e.g.
> >> colFam1:*).  Is there a way to do this in HBase?
> >>
> >> This is assuming we dont know the column name, we want them all.
> >>
> >>
> >> llpind wrote:
> >>>
> >>> Thanks.
> >>>
> >>> Yea I've got that colFam for sure in the HBase table:
> >>>
> >>> {NAME => 'tableA', FAMILIES => [{NAME => 'colFam1', VERSIONS => '3',
> >>> COMPRESSION => 'NONE', LENGTH => '2147483647',
> >>>  TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME =>
> >>> 'colFam2', VERSIONS => '3', COMPRESSION =>
> >>>  'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
> >>> BLOCKCACHE => 'false'}]}
> >>>
> >>>
> >>> I've been trying to play with rowcounter, and not having much luck
> >>> either.
> >>>
> >>> I run the command:
> >>> hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
> >>> /home/hadoop/dev/rowcounter7 tableA colFam1:
> >>>
> >>>
> >>> The map/reduce finishes just like it does with my own program, but with
> >>> all part files empty in /home/hadoop/dev/rowcounter7.
> >>>
> >>> Any Ideas?
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23967196.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23971190.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

Okay, I think I got it figured out.

although when scanning large row keys I do get the following exception:

NativeException: java.lang.RuntimeException:
org.apache.hadoop.hbase.UnknownScannerException:
org.apache.hadoop.hbase.UnknownScannerException: -4424757523660246367
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.close(HRegionServer.java:1745)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)

        from org/apache/hadoop/hbase/client/HTable.java:1741:in `hasNext'
        from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
        from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
        from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
        from java/lang/reflect/Method.java:597:in `invoke'
        from org/jruby/javasupport/JavaMethod.java:298:in
`invokeWithExceptionHandling'
        from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
        from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
        from org/jruby/runtime/callsite/CachingCallSite.java:73:in `call'
        from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
        from org/jruby/ast/WhileNode.java:124:in `interpret'
        from org/jruby/ast/NewlineNode.java:101:in `interpret'
        from org/jruby/ast/BlockNode.java:68:in `interpret'
        from org/jruby/internal/runtime/methods/DefaultMethod.java:156:in
`interpretedCall'
        from org/jruby/internal/runtime/methods/DefaultMethod.java:133:in
`call'
        from org/jruby/internal/runtime/methods/DefaultMethod.java:246:in
`call'
... 108 levels...
        from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
`call'
        from org/jruby/internal/runtime/methods/CompiledMethod.java:216:in
`call'
        from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
`call'
        from org/jruby/runtime/callsite/CachingCallSite.java:260:in
`cacheAndCall'
        from org/jruby/runtime/callsite/CachingCallSite.java:75:in `call'
        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:441:in
`__file__'
        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:-1:in
`__file__'
        from home/hadoop/hbase193/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
        from org/jruby/Ruby.java:564:in `runScript'
        from org/jruby/Ruby.java:467:in `runNormally'
        from org/jruby/Ruby.java:340:in `runFromMain'
        from org/jruby/Main.java:214:in `run'
        from org/jruby/Main.java:100:in `run'
        from org/jruby/Main.java:84:in `main'
        from /home/hadoop/hbase193/bin/../bin/hirb.rb:346:in `scan'


===================================================

Is there an easy way around this problem?  




Billy Pearson-2 wrote:
> 
> Yes that's what scanners are good for they will return all the 
> columns:lables combos for a row
> What does the MR job stats say for rows processed for the maps and
> reduces?
> 
> Billy Pearson
> 
> 
> 
> "llpind" <so...@hotmail.com> wrote in 
> message news:23967196.post@talk.nabble.com...
>>
>> also,
>>
>> I think what we want is a way to wildcard everything after colFam1: 
>> (e.g.
>> colFam1:*).  Is there a way to do this in HBase?
>>
>> This is assuming we dont know the column name, we want them all.
>>
>>
>> llpind wrote:
>>>
>>> Thanks.
>>>
>>> Yea I've got that colFam for sure in the HBase table:
>>>
>>> {NAME => 'tableA', FAMILIES => [{NAME => 'colFam1', VERSIONS => '3',
>>> COMPRESSION => 'NONE', LENGTH => '2147483647',
>>>  TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME =>
>>> 'colFam2', VERSIONS => '3', COMPRESSION =>
>>>  'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
>>> BLOCKCACHE => 'false'}]}
>>>
>>>
>>> I've been trying to play with rowcounter, and not having much luck 
>>> either.
>>>
>>> I run the command:
>>> hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
>>> /home/hadoop/dev/rowcounter7 tableA colFam1:
>>>
>>>
>>> The map/reduce finishes just like it does with my own program, but with
>>> all part files empty in /home/hadoop/dev/rowcounter7.
>>>
>>> Any Ideas?
>>>
>>>
>>
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23967196.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23971190.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

Yes that's what scanners are good for they will return all the 
columns:lables combos for a row
What does the MR job stats say for rows processed for the maps and reduces?

Billy Pearson



"llpind" <so...@hotmail.com> wrote in 
message news:23967196.post@talk.nabble.com...
>
> also,
>
> I think what we want is a way to wildcard everything after colFam1: 
> (e.g.
> colFam1:*).  Is there a way to do this in HBase?
>
> This is assuming we dont know the column name, we want them all.
>
>
> llpind wrote:
>>
>> Thanks.
>>
>> Yea I've got that colFam for sure in the HBase table:
>>
>> {NAME => 'tableA', FAMILIES => [{NAME => 'colFam1', VERSIONS => '3',
>> COMPRESSION => 'NONE', LENGTH => '2147483647',
>>  TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME =>
>> 'colFam2', VERSIONS => '3', COMPRESSION =>
>>  'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
>> BLOCKCACHE => 'false'}]}
>>
>>
>> I've been trying to play with rowcounter, and not having much luck 
>> either.
>>
>> I run the command:
>> hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
>> /home/hadoop/dev/rowcounter7 tableA colFam1:
>>
>>
>> The map/reduce finishes just like it does with my own program, but with
>> all part files empty in /home/hadoop/dev/rowcounter7.
>>
>> Any Ideas?
>>
>>
>
> -- 
> View this message in context: 
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23967196.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

also,

I think what we want is a way to wildcard everything after colFam1:   (e.g.
colFam1:*).  Is there a way to do this in HBase?

This is assuming we dont know the column name, we want them all.


llpind wrote:
> 
> Thanks.
> 
> Yea I've got that colFam for sure in the HBase table:
> 
> {NAME => 'tableA', FAMILIES => [{NAME => 'colFam1', VERSIONS => '3',
> COMPRESSION => 'NONE', LENGTH => '2147483647',
>  TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME =>
> 'colFam2', VERSIONS => '3', COMPRESSION =>
>  'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}]}
> 
> 
> I've been trying to play with rowcounter, and not having much luck either.
> 
> I run the command:
> hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
> /home/hadoop/dev/rowcounter7 tableA colFam1:
> 
> 
> The map/reduce finishes just like it does with my own program, but with
> all part files empty in /home/hadoop/dev/rowcounter7.   
> 
> Any Ideas?
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23967196.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by stack <st...@duboce.net>.

rowcounter counts rows only.  it does not produce any output.
St.Ack

On Wed, Jun 10, 2009 at 10:03 AM, llpind <so...@hotmail.com> wrote:

>
> Thanks.
>
> Yea I've got that colFam for sure in the HBase table.
>
> I've been trying to play with rowcounter, and not having much luck either.
>
> I run the command:
> hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
> /home/hadoop/dev/rowcounter7 tableA colFam1:
>
>
> The map/reduce finishes just like it does with my own program, but with all
> part files empty in /home/hadoop/dev/rowcounter7.
>
> Any Ideas?
>
> Billy Pearson-2 wrote:
> >
> > You could try scanning it with shell to make sure there is data
> > bin/hbase shell
> > help ->  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \
> >              STARTROW => 'xyz'}
> >
> > so something like
> > scan 'tablename', {COLUMNS => ['col1']}
> >
> > That will spit out data if there is any
> > I thank you might be able to even call
> > scan 'tablename'
> >
> > Billy
> >
> >
> >
> >
> > "llpind" <so...@hotmail.com> wrote in
> > message news:23954242.post@talk.nabble.com...
> >>
> >> Yeah I noticed that shortly after I posted.  I have it "colFam1:" now.
> >> I'm
> >> positive my table has that column family, but my output table still has
> >> nothing in it.
> >>
> >> I'm looking at the source code for rowcounter, and it doesn't even
> >> require
> >> :.  does it need to be passed in?
> >>
> >>
> >> I may be going about this wrong, I'm open to ideas.  I need a way to
> >> iterate
> >> over an entire HBase table, and count columns (column counter instead of
> >> rowcounter).
> >>
> >> Billy Pearson-2 wrote:
> >>>
> >>> try with out the * in the column
> >>> "colFam1:*",
> >>> try
> >>> "colFam1:",
> >>>
> >>> I do not thank the * works like a all option just leave it blank
> >>> colFam1:
> >>> and it will give all results
> >>>
> >>> Billy
> >>>
> >>>
> >>> "llpind" <so...@hotmail.com> wrote in
> >>> message
> >>> news:23952252.post@talk.nabble.com...
> >>>>
> >>>> Hi again,
> >>>>
> >>>> I need some help with a map/reduce program I have which copies data
> >>>> from
> >>>> one
> >>>> table to another.  What I would like to do is iterate through an
> entire
> >>>> HBase table, and for a given row key and column family count the
> number
> >>>> of
> >>>> records.  So the output table will have a single column family named
> >>>> 'count'
> >>>> (e.g. entry would look something like 'rowKey1',
> >>>> 'count:count_for_rowkey1',
> >>>> '534', where the rowkey could be the same as input table ).
> >>>>
> >>>> here is my first attempt:
> >>>>
> >>>> CONF:
> >>>>
> =========================================================================
> >>>>
> >>>> c.setInputFormat(TableInputFormat.class);
> >>>> c.setOutputFormat(TableOutputFormat.class);
> >>>>
> >>>>     TableMapReduceUtil.initTableMapJob("inputTableName", "colFam1:*",
> >>>> MapperClass.class,
> >>>>           ImmutableBytesWritable.class, RowResult.class, c);
> >>>>
> >>>>     TableMapReduceUtil.initTableReduceJob("outputTableName",
> >>>> ReducerClass.class, c );
> >>>>
> >>>> MapperClass:
> >>>> =====================================================================
> >>>>
> >>>> @Override
> >>>> public void map(
> >>>> ImmutableBytesWritable key,
> >>>> RowResult row,
> >>>> OutputCollector<ImmutableBytesWritable, RowResult> collector,
> >>>> Reporter reporter) throws IOException {
> >>>>
> >>>>
> >>>> reporter.incrCounter(Counters.ROWS, 1);
> >>>> collector.collect(key, row);
> >>>> }
> >>>>
> >>>>
> ReducerClass:================================================================
> >>>>
> >>>> @Override
> >>>> public void reduce(ImmutableBytesWritable k,
> >>>> Iterator<RowResult> v,
> >>>> OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
> >>>> Reporter r) throws IOException {
> >>>>
> >>>> while (v.hasNext()){
> >>>> BatchUpdate bu = new BatchUpdate(k.get());
> >>>> while (v.hasNext()){
> >>>> RowResult row = v.next();
> >>>> bu.put(Bytes.toBytes("count:rowToCountName"),
> >>>> Bytes.toBytes(row.size()));
> >>>> }
> >>>> c.collect(k, bu);
> >>>> }
> >>>>                             }
> >>>>
> >>>>
> ========================================================================
> >>>>
> >>>> It runs the map/reduce, but I get nothing in my output table.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> llpind
> >>>> --
> >>>> View this message in context:
> >>>>
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23952252.html
> >>>> Sent from the HBase User mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23954242.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23966757.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

Thanks.

Yea I've got that colFam for sure in the HBase table. 

I've been trying to play with rowcounter, and not having much luck either.

I run the command:
hadoop19/bin/hadoop org.apache.hadoop.hbase.mapred.Driver rowcounter
/home/hadoop/dev/rowcounter7 tableA colFam1:


The map/reduce finishes just like it does with my own program, but with all
part files empty in /home/hadoop/dev/rowcounter7.   

Any Ideas?

Billy Pearson-2 wrote:
> 
> You could try scanning it with shell to make sure there is data
> bin/hbase shell
> help ->  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \
>              STARTROW => 'xyz'}
> 
> so something like
> scan 'tablename', {COLUMNS => ['col1']}
> 
> That will spit out data if there is any
> I thank you might be able to even call
> scan 'tablename'
> 
> Billy
> 
> 
> 
> 
> "llpind" <so...@hotmail.com> wrote in 
> message news:23954242.post@talk.nabble.com...
>>
>> Yeah I noticed that shortly after I posted.  I have it "colFam1:" now. 
>> I'm
>> positive my table has that column family, but my output table still has
>> nothing in it.
>>
>> I'm looking at the source code for rowcounter, and it doesn't even
>> require
>> :.  does it need to be passed in?
>>
>>
>> I may be going about this wrong, I'm open to ideas.  I need a way to 
>> iterate
>> over an entire HBase table, and count columns (column counter instead of
>> rowcounter).
>>
>> Billy Pearson-2 wrote:
>>>
>>> try with out the * in the column
>>> "colFam1:*",
>>> try
>>> "colFam1:",
>>>
>>> I do not thank the * works like a all option just leave it blank
>>> colFam1:
>>> and it will give all results
>>>
>>> Billy
>>>
>>>
>>> "llpind" <so...@hotmail.com> wrote in
>>> message 
>>> news:23952252.post@talk.nabble.com...
>>>>
>>>> Hi again,
>>>>
>>>> I need some help with a map/reduce program I have which copies data
>>>> from
>>>> one
>>>> table to another.  What I would like to do is iterate through an entire
>>>> HBase table, and for a given row key and column family count the number
>>>> of
>>>> records.  So the output table will have a single column family named
>>>> 'count'
>>>> (e.g. entry would look something like 'rowKey1',
>>>> 'count:count_for_rowkey1',
>>>> '534', where the rowkey could be the same as input table ).
>>>>
>>>> here is my first attempt:
>>>>
>>>> CONF:
>>>> =========================================================================
>>>>
>>>> c.setInputFormat(TableInputFormat.class);
>>>> c.setOutputFormat(TableOutputFormat.class);
>>>>
>>>>     TableMapReduceUtil.initTableMapJob("inputTableName", "colFam1:*",
>>>> MapperClass.class,
>>>>           ImmutableBytesWritable.class, RowResult.class, c);
>>>>
>>>>     TableMapReduceUtil.initTableReduceJob("outputTableName",
>>>> ReducerClass.class, c );
>>>>
>>>> MapperClass:
>>>> =====================================================================
>>>>
>>>> @Override
>>>> public void map(
>>>> ImmutableBytesWritable key,
>>>> RowResult row,
>>>> OutputCollector<ImmutableBytesWritable, RowResult> collector,
>>>> Reporter reporter) throws IOException {
>>>>
>>>>
>>>> reporter.incrCounter(Counters.ROWS, 1);
>>>> collector.collect(key, row);
>>>> }
>>>>
>>>> ReducerClass:================================================================
>>>>
>>>> @Override
>>>> public void reduce(ImmutableBytesWritable k,
>>>> Iterator<RowResult> v,
>>>> OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
>>>> Reporter r) throws IOException {
>>>>
>>>> while (v.hasNext()){
>>>> BatchUpdate bu = new BatchUpdate(k.get());
>>>> while (v.hasNext()){
>>>> RowResult row = v.next();
>>>> bu.put(Bytes.toBytes("count:rowToCountName"),
>>>> Bytes.toBytes(row.size()));
>>>> }
>>>> c.collect(k, bu);
>>>> }
>>>>                             }
>>>>
>>>> ========================================================================
>>>>
>>>> It runs the map/reduce, but I get nothing in my output table.
>>>>
>>>> Thanks.
>>>>
>>>> llpind
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23952252.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23954242.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23966757.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

You could try scanning it with shell to make sure there is data
bin/hbase shell
help ->  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \
             STARTROW => 'xyz'}

so something like
scan 'tablename', {COLUMNS => ['col1']}

That will spit out data if there is any
I thank you might be able to even call
scan 'tablename'

Billy




"llpind" <so...@hotmail.com> wrote in 
message news:23954242.post@talk.nabble.com...
>
> Yeah I noticed that shortly after I posted.  I have it "colFam1:" now. 
> I'm
> positive my table has that column family, but my output table still has
> nothing in it.
>
> I'm looking at the source code for rowcounter, and it doesn't even require
> :.  does it need to be passed in?
>
>
> I may be going about this wrong, I'm open to ideas.  I need a way to 
> iterate
> over an entire HBase table, and count columns (column counter instead of
> rowcounter).
>
> Billy Pearson-2 wrote:
>>
>> try with out the * in the column
>> "colFam1:*",
>> try
>> "colFam1:",
>>
>> I do not thank the * works like a all option just leave it blank colFam1:
>> and it will give all results
>>
>> Billy
>>
>>
>> "llpind" <so...@hotmail.com> wrote in
>> message 
>> news:23952252.post@talk.nabble.com...
>>>
>>> Hi again,
>>>
>>> I need some help with a map/reduce program I have which copies data from
>>> one
>>> table to another.  What I would like to do is iterate through an entire
>>> HBase table, and for a given row key and column family count the number
>>> of
>>> records.  So the output table will have a single column family named
>>> 'count'
>>> (e.g. entry would look something like 'rowKey1',
>>> 'count:count_for_rowkey1',
>>> '534', where the rowkey could be the same as input table ).
>>>
>>> here is my first attempt:
>>>
>>> CONF:
>>> =========================================================================
>>>
>>> c.setInputFormat(TableInputFormat.class);
>>> c.setOutputFormat(TableOutputFormat.class);
>>>
>>>     TableMapReduceUtil.initTableMapJob("inputTableName", "colFam1:*",
>>> MapperClass.class,
>>>           ImmutableBytesWritable.class, RowResult.class, c);
>>>
>>>     TableMapReduceUtil.initTableReduceJob("outputTableName",
>>> ReducerClass.class, c );
>>>
>>> MapperClass:
>>> =====================================================================
>>>
>>> @Override
>>> public void map(
>>> ImmutableBytesWritable key,
>>> RowResult row,
>>> OutputCollector<ImmutableBytesWritable, RowResult> collector,
>>> Reporter reporter) throws IOException {
>>>
>>>
>>> reporter.incrCounter(Counters.ROWS, 1);
>>> collector.collect(key, row);
>>> }
>>>
>>> ReducerClass:================================================================
>>>
>>> @Override
>>> public void reduce(ImmutableBytesWritable k,
>>> Iterator<RowResult> v,
>>> OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
>>> Reporter r) throws IOException {
>>>
>>> while (v.hasNext()){
>>> BatchUpdate bu = new BatchUpdate(k.get());
>>> while (v.hasNext()){
>>> RowResult row = v.next();
>>> bu.put(Bytes.toBytes("count:rowToCountName"),
>>> Bytes.toBytes(row.size()));
>>> }
>>> c.collect(k, bu);
>>> }
>>>                             }
>>>
>>> ========================================================================
>>>
>>> It runs the map/reduce, but I get nothing in my output table.
>>>
>>> Thanks.
>>>
>>> llpind
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23952252.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>>
>
> -- 
> View this message in context: 
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23954242.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Help with Map/Reduce program

Posted by llpind <so...@hotmail.com>.

Yeah I noticed that shortly after I posted.  I have it "colFam1:" now.    I'm
positive my table has that column family, but my output table still has
nothing in it.

I'm looking at the source code for rowcounter, and it doesn't even require
:.  does it need to be passed in?


I may be going about this wrong, I'm open to ideas.  I need a way to iterate
over an entire HBase table, and count columns (column counter instead of
rowcounter). 

Billy Pearson-2 wrote:
> 
> try with out the * in the column
> "colFam1:*",
> try
> "colFam1:",
> 
> I do not thank the * works like a all option just leave it blank colFam1: 
> and it will give all results
> 
> Billy
> 
> 
> "llpind" <so...@hotmail.com> wrote in 
> message news:23952252.post@talk.nabble.com...
>>
>> Hi again,
>>
>> I need some help with a map/reduce program I have which copies data from 
>> one
>> table to another.  What I would like to do is iterate through an entire
>> HBase table, and for a given row key and column family count the number
>> of
>> records.  So the output table will have a single column family named 
>> 'count'
>> (e.g. entry would look something like 'rowKey1', 
>> 'count:count_for_rowkey1',
>> '534', where the rowkey could be the same as input table ).
>>
>> here is my first attempt:
>>
>> CONF:
>> =========================================================================
>>
>> c.setInputFormat(TableInputFormat.class);
>> c.setOutputFormat(TableOutputFormat.class);
>>
>>     TableMapReduceUtil.initTableMapJob("inputTableName", "colFam1:*",
>> MapperClass.class,
>>           ImmutableBytesWritable.class, RowResult.class, c);
>>
>>     TableMapReduceUtil.initTableReduceJob("outputTableName",
>> ReducerClass.class, c );
>>
>> MapperClass:
>> =====================================================================
>>
>> @Override
>> public void map(
>> ImmutableBytesWritable key,
>> RowResult row,
>> OutputCollector<ImmutableBytesWritable, RowResult> collector,
>> Reporter reporter) throws IOException {
>>
>>
>> reporter.incrCounter(Counters.ROWS, 1);
>> collector.collect(key, row);
>> }
>>
>> ReducerClass:================================================================
>>
>> @Override
>> public void reduce(ImmutableBytesWritable k,
>> Iterator<RowResult> v,
>> OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
>> Reporter r) throws IOException {
>>
>> while (v.hasNext()){
>> BatchUpdate bu = new BatchUpdate(k.get());
>> while (v.hasNext()){
>> RowResult row = v.next();
>> bu.put(Bytes.toBytes("count:rowToCountName"),
>> Bytes.toBytes(row.size()));
>> }
>> c.collect(k, bu);
>> }
>>                             }
>>
>> ========================================================================
>>
>> It runs the map/reduce, but I get nothing in my output table.
>>
>> Thanks.
>>
>> llpind
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23952252.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23954242.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Help with Map/Reduce program

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

try with out the * in the column
"colFam1:*",
try
"colFam1:",

I do not thank the * works like a all option just leave it blank colFam1: 
and it will give all results

Billy


"llpind" <so...@hotmail.com> wrote in 
message news:23952252.post@talk.nabble.com...
>
> Hi again,
>
> I need some help with a map/reduce program I have which copies data from 
> one
> table to another.  What I would like to do is iterate through an entire
> HBase table, and for a given row key and column family count the number of
> records.  So the output table will have a single column family named 
> 'count'
> (e.g. entry would look something like 'rowKey1', 
> 'count:count_for_rowkey1',
> '534', where the rowkey could be the same as input table ).
>
> here is my first attempt:
>
> CONF:
> =========================================================================
>
> c.setInputFormat(TableInputFormat.class);
> c.setOutputFormat(TableOutputFormat.class);
>
>     TableMapReduceUtil.initTableMapJob("inputTableName", "colFam1:*",
> MapperClass.class,
>           ImmutableBytesWritable.class, RowResult.class, c);
>
>     TableMapReduceUtil.initTableReduceJob("outputTableName",
> ReducerClass.class, c );
>
> MapperClass:
> =====================================================================
>
> @Override
> public void map(
> ImmutableBytesWritable key,
> RowResult row,
> OutputCollector<ImmutableBytesWritable, RowResult> collector,
> Reporter reporter) throws IOException {
>
>
> reporter.incrCounter(Counters.ROWS, 1);
> collector.collect(key, row);
> }
>
> ReducerClass:================================================================
>
> @Override
> public void reduce(ImmutableBytesWritable k,
> Iterator<RowResult> v,
> OutputCollector<ImmutableBytesWritable, BatchUpdate> c,
> Reporter r) throws IOException {
>
> while (v.hasNext()){
> BatchUpdate bu = new BatchUpdate(k.get());
> while (v.hasNext()){
> RowResult row = v.next();
> bu.put(Bytes.toBytes("count:rowToCountName"),
> Bytes.toBytes(row.size()));
> }
> c.collect(k, bu);
> }
>                             }
>
> ========================================================================
>
> It runs the map/reduce, but I get nothing in my output table.
>
> Thanks.
>
> llpind
> -- 
> View this message in context: 
> http://www.nabble.com/Help-with-Map-Reduce-program-tp23952252p23952252.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>