You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Erik Holstad <er...@gmail.com> on 2008/10/18 03:39:09 UTC

How to get all columns from the scanner in a Map-Reduce job?

Hi!
I'm trying to figure out how to get all the columns in a Map-Reduce job
without having to specify
them all?

Found the line:
@see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column name
   *      wildcards

in TableInputFormat.java but didn't find any help over in the HAbScanner.

Regards Erik

Re: How to get all columns from the scanner in a Map-Reduce job?

Posted by Ryan Smith <ry...@gmail.com>.
To match all rowKeys, i used (.+)  as a regexp, but now im going to test it
again to make sure.

-Ryan

On Mon, Oct 20, 2008 at 4:54 PM, Erik Holstad <er...@gmail.com> wrote:

> Tried it and it didn't work, but then I realized that it doesn't
> work for scanners either, so I refiled the issue to client/944 instead
>
> Regards Erik
>
>
> On Mon, Oct 20, 2008 at 11:13 AM, Erik Holstad <erikholstad@gmail.com
> >wrote:
>
> > Hi Stack!
> > Will try that fix, opened up a Jira-941 in the meantime.
> >
> > Regards Erik
> >
> >
> >
> >
> >
> > On Sun, Oct 19, 2008 at 4:05 PM, Michael Stack <st...@duboce.net> wrote:
> >
> >> What happens if you pass a column name of "^.*$"?  Will it return all
> >> columns?  I don't think it will.  IIRC the regex can only be applied to
> the
> >> column qualifier portion of column name which means you'd have to write
> out
> >> a column spec. for your mapreduce job per column family.  So, if you had
> >> three famlies but each had a thousand columns, if you write a column
> >> specification of "family1:.* family2:.* family3:.*", that should return
> them
> >> all.
> >>
> >> I took a quick look.  It should be the case that an empty string returns
> >> all columns of a row but currently at least, it'll fail on line #75 in
> >> TableInputFormat:
> >>
> >>   if (colArg == null || colArg.length() == 0) {
> >>
> >> Try removing the colArg.length().  Maybe it'll work then? (You'll pass
> in
> >> an array of columns of zero-length -- I think that'll work).
> >>
> >> Meantime, open a JIRA Eric.  Seems like a basic expectation, that there
> be
> >> a way to get all columns in an MR.
> >>
> >> St.Ack
> >>
> >>
> >> Erik Holstad wrote:
> >>
> >>> Hey!
> >>> Yes I did find that line in HAbstractScanner.java but not really sure
> >>>  how to use it to do what I want to do.
> >>>
> >>> Regards Erik
> >>>
> >>> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >>> >wrote:
> >>>
> >>>
> >>>
> >>>> I think you are looking for this :
> >>>>
> >>>> // Pattern to determine if a column key is a regex
> >>>>  static Pattern isRegexPattern =
> >>>>   Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
> >>>>
> >>>> J-D
> >>>>
> >>>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Hi!
> >>>>> I'm trying to figure out how to get all the columns in a Map-Reduce
> job
> >>>>> without having to specify
> >>>>> them all?
> >>>>>
> >>>>> Found the line:
> >>>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
> >>>>>
> >>>>>
> >>>> name
> >>>>
> >>>>
> >>>>>  *      wildcards
> >>>>>
> >>>>> in TableInputFormat.java but didn't find any help over in the
> >>>>> HAbScanner.
> >>>>>
> >>>>> Regards Erik
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>

Re: How to get all columns from the scanner in a Map-Reduce job?

Posted by Erik Holstad <er...@gmail.com>.
Tried it and it didn't work, but then I realized that it doesn't
work for scanners either, so I refiled the issue to client/944 instead

Regards Erik


On Mon, Oct 20, 2008 at 11:13 AM, Erik Holstad <er...@gmail.com>wrote:

> Hi Stack!
> Will try that fix, opened up a Jira-941 in the meantime.
>
> Regards Erik
>
>
>
>
>
> On Sun, Oct 19, 2008 at 4:05 PM, Michael Stack <st...@duboce.net> wrote:
>
>> What happens if you pass a column name of "^.*$"?  Will it return all
>> columns?  I don't think it will.  IIRC the regex can only be applied to the
>> column qualifier portion of column name which means you'd have to write out
>> a column spec. for your mapreduce job per column family.  So, if you had
>> three famlies but each had a thousand columns, if you write a column
>> specification of "family1:.* family2:.* family3:.*", that should return them
>> all.
>>
>> I took a quick look.  It should be the case that an empty string returns
>> all columns of a row but currently at least, it'll fail on line #75 in
>> TableInputFormat:
>>
>>   if (colArg == null || colArg.length() == 0) {
>>
>> Try removing the colArg.length().  Maybe it'll work then? (You'll pass in
>> an array of columns of zero-length -- I think that'll work).
>>
>> Meantime, open a JIRA Eric.  Seems like a basic expectation, that there be
>> a way to get all columns in an MR.
>>
>> St.Ack
>>
>>
>> Erik Holstad wrote:
>>
>>> Hey!
>>> Yes I did find that line in HAbstractScanner.java but not really sure
>>>  how to use it to do what I want to do.
>>>
>>> Regards Erik
>>>
>>> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jdcryans@apache.org
>>> >wrote:
>>>
>>>
>>>
>>>> I think you are looking for this :
>>>>
>>>> // Pattern to determine if a column key is a regex
>>>>  static Pattern isRegexPattern =
>>>>   Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>>>>
>>>> J-D
>>>>
>>>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> Hi!
>>>>> I'm trying to figure out how to get all the columns in a Map-Reduce job
>>>>> without having to specify
>>>>> them all?
>>>>>
>>>>> Found the line:
>>>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
>>>>>
>>>>>
>>>> name
>>>>
>>>>
>>>>>  *      wildcards
>>>>>
>>>>> in TableInputFormat.java but didn't find any help over in the
>>>>> HAbScanner.
>>>>>
>>>>> Regards Erik
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: How to get all columns from the scanner in a Map-Reduce job?

Posted by Erik Holstad <er...@gmail.com>.
Hi Stack!
Will try that fix, opened up a Jira-941 in the meantime.

Regards Erik




On Sun, Oct 19, 2008 at 4:05 PM, Michael Stack <st...@duboce.net> wrote:

> What happens if you pass a column name of "^.*$"?  Will it return all
> columns?  I don't think it will.  IIRC the regex can only be applied to the
> column qualifier portion of column name which means you'd have to write out
> a column spec. for your mapreduce job per column family.  So, if you had
> three famlies but each had a thousand columns, if you write a column
> specification of "family1:.* family2:.* family3:.*", that should return them
> all.
>
> I took a quick look.  It should be the case that an empty string returns
> all columns of a row but currently at least, it'll fail on line #75 in
> TableInputFormat:
>
>   if (colArg == null || colArg.length() == 0) {
>
> Try removing the colArg.length().  Maybe it'll work then? (You'll pass in
> an array of columns of zero-length -- I think that'll work).
>
> Meantime, open a JIRA Eric.  Seems like a basic expectation, that there be
> a way to get all columns in an MR.
>
> St.Ack
>
>
> Erik Holstad wrote:
>
>> Hey!
>> Yes I did find that line in HAbstractScanner.java but not really sure
>>  how to use it to do what I want to do.
>>
>> Regards Erik
>>
>> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>>
>>
>>
>>> I think you are looking for this :
>>>
>>> // Pattern to determine if a column key is a regex
>>>  static Pattern isRegexPattern =
>>>   Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>>>
>>> J-D
>>>
>>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>>> Hi!
>>>> I'm trying to figure out how to get all the columns in a Map-Reduce job
>>>> without having to specify
>>>> them all?
>>>>
>>>> Found the line:
>>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
>>>>
>>>>
>>> name
>>>
>>>
>>>>  *      wildcards
>>>>
>>>> in TableInputFormat.java but didn't find any help over in the
>>>> HAbScanner.
>>>>
>>>> Regards Erik
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: How to get all columns from the scanner in a Map-Reduce job?

Posted by Michael Stack <st...@duboce.net>.
What happens if you pass a column name of "^.*$"?  Will it return all 
columns?  I don't think it will.  IIRC the regex can only be applied to 
the column qualifier portion of column name which means you'd have to 
write out a column spec. for your mapreduce job per column family.  So, 
if you had three famlies but each had a thousand columns, if you write a 
column specification of "family1:.* family2:.* family3:.*", that should 
return them all.

I took a quick look.  It should be the case that an empty string returns 
all columns of a row but currently at least, it'll fail on line #75 in 
TableInputFormat:

    if (colArg == null || colArg.length() == 0) {

Try removing the colArg.length().  Maybe it'll work then? (You'll pass 
in an array of columns of zero-length -- I think that'll work).

Meantime, open a JIRA Eric.  Seems like a basic expectation, that there 
be a way to get all columns in an MR.

St.Ack

Erik Holstad wrote:
> Hey!
> Yes I did find that line in HAbstractScanner.java but not really sure
>   
> how to use it to do what I want to do.
>
> Regards Erik
>
> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>   
>> I think you are looking for this :
>>
>> // Pattern to determine if a column key is a regex
>>  static Pattern isRegexPattern =
>>    Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>>
>> J-D
>>
>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
>> wrote:
>>
>>     
>>> Hi!
>>> I'm trying to figure out how to get all the columns in a Map-Reduce job
>>> without having to specify
>>> them all?
>>>
>>> Found the line:
>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
>>>       
>> name
>>     
>>>   *      wildcards
>>>
>>> in TableInputFormat.java but didn't find any help over in the HAbScanner.
>>>
>>> Regards Erik
>>>
>>>       
>
>   


Re: How to get all columns from the scanner in a Map-Reduce job?

Posted by Erik Holstad <er...@gmail.com>.
Hey!
Yes I did find that line in HAbstractScanner.java but not really sure
how to use it to do what I want to do.

Regards Erik

On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I think you are looking for this :
>
> // Pattern to determine if a column key is a regex
>  static Pattern isRegexPattern =
>    Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>
> J-D
>
> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hi!
> > I'm trying to figure out how to get all the columns in a Map-Reduce job
> > without having to specify
> > them all?
> >
> > Found the line:
> > @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
> name
> >   *      wildcards
> >
> > in TableInputFormat.java but didn't find any help over in the HAbScanner.
> >
> > Regards Erik
> >
>

Re: How to get all columns from the scanner in a Map-Reduce job?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I think you are looking for this :

// Pattern to determine if a column key is a regex
  static Pattern isRegexPattern =
    Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");

J-D

On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com> wrote:

> Hi!
> I'm trying to figure out how to get all the columns in a Map-Reduce job
> without having to specify
> them all?
>
> Found the line:
> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column name
>   *      wildcards
>
> in TableInputFormat.java but didn't find any help over in the HAbScanner.
>
> Regards Erik
>