You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Erik Holstad <er...@gmail.com> on 2008/10/18 03:39:09 UTC
How to get all columns from the scanner in a Map-Reduce job?
Hi!
I'm trying to figure out how to get all the columns in a Map-Reduce job
without having to specify
them all?
Found the line:
@see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column name
* wildcards
in TableInputFormat.java but didn't find any help over in the HAbScanner.
Regards Erik
Re: How to get all columns from the scanner in a Map-Reduce job?
Posted by Ryan Smith <ry...@gmail.com>.
To match all rowKeys, i used (.+) as a regexp, but now im going to test it
again to make sure.
-Ryan
On Mon, Oct 20, 2008 at 4:54 PM, Erik Holstad <er...@gmail.com> wrote:
> Tried it and it didn't work, but then I realized that it doesn't
> work for scanners either, so I refiled the issue to client/944 instead
>
> Regards Erik
>
>
> On Mon, Oct 20, 2008 at 11:13 AM, Erik Holstad <erikholstad@gmail.com
> >wrote:
>
> > Hi Stack!
> > Will try that fix, opened up a Jira-941 in the meantime.
> >
> > Regards Erik
> >
> >
> >
> >
> >
> > On Sun, Oct 19, 2008 at 4:05 PM, Michael Stack <st...@duboce.net> wrote:
> >
> >> What happens if you pass a column name of "^.*$"? Will it return all
> >> columns? I don't think it will. IIRC the regex can only be applied to
> the
> >> column qualifier portion of column name which means you'd have to write
> out
> >> a column spec. for your mapreduce job per column family. So, if you had
> >> three famlies but each had a thousand columns, if you write a column
> >> specification of "family1:.* family2:.* family3:.*", that should return
> them
> >> all.
> >>
> >> I took a quick look. It should be the case that an empty string returns
> >> all columns of a row but currently at least, it'll fail on line #75 in
> >> TableInputFormat:
> >>
> >> if (colArg == null || colArg.length() == 0) {
> >>
> >> Try removing the colArg.length(). Maybe it'll work then? (You'll pass
> in
> >> an array of columns of zero-length -- I think that'll work).
> >>
> >> Meantime, open a JIRA Eric. Seems like a basic expectation, that there
> be
> >> a way to get all columns in an MR.
> >>
> >> St.Ack
> >>
> >>
> >> Erik Holstad wrote:
> >>
> >>> Hey!
> >>> Yes I did find that line in HAbstractScanner.java but not really sure
> >>> how to use it to do what I want to do.
> >>>
> >>> Regards Erik
> >>>
> >>> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >>> >wrote:
> >>>
> >>>
> >>>
> >>>> I think you are looking for this :
> >>>>
> >>>> // Pattern to determine if a column key is a regex
> >>>> static Pattern isRegexPattern =
> >>>> Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
> >>>>
> >>>> J-D
> >>>>
> >>>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Hi!
> >>>>> I'm trying to figure out how to get all the columns in a Map-Reduce
> job
> >>>>> without having to specify
> >>>>> them all?
> >>>>>
> >>>>> Found the line:
> >>>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
> >>>>>
> >>>>>
> >>>> name
> >>>>
> >>>>
> >>>>> * wildcards
> >>>>>
> >>>>> in TableInputFormat.java but didn't find any help over in the
> >>>>> HAbScanner.
> >>>>>
> >>>>> Regards Erik
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
Re: How to get all columns from the scanner in a Map-Reduce job?
Posted by Erik Holstad <er...@gmail.com>.
Tried it and it didn't work, but then I realized that it doesn't
work for scanners either, so I refiled the issue to client/944 instead
Regards Erik
On Mon, Oct 20, 2008 at 11:13 AM, Erik Holstad <er...@gmail.com>wrote:
> Hi Stack!
> Will try that fix, opened up a Jira-941 in the meantime.
>
> Regards Erik
>
>
>
>
>
> On Sun, Oct 19, 2008 at 4:05 PM, Michael Stack <st...@duboce.net> wrote:
>
>> What happens if you pass a column name of "^.*$"? Will it return all
>> columns? I don't think it will. IIRC the regex can only be applied to the
>> column qualifier portion of column name which means you'd have to write out
>> a column spec. for your mapreduce job per column family. So, if you had
>> three famlies but each had a thousand columns, if you write a column
>> specification of "family1:.* family2:.* family3:.*", that should return them
>> all.
>>
>> I took a quick look. It should be the case that an empty string returns
>> all columns of a row but currently at least, it'll fail on line #75 in
>> TableInputFormat:
>>
>> if (colArg == null || colArg.length() == 0) {
>>
>> Try removing the colArg.length(). Maybe it'll work then? (You'll pass in
>> an array of columns of zero-length -- I think that'll work).
>>
>> Meantime, open a JIRA Eric. Seems like a basic expectation, that there be
>> a way to get all columns in an MR.
>>
>> St.Ack
>>
>>
>> Erik Holstad wrote:
>>
>>> Hey!
>>> Yes I did find that line in HAbstractScanner.java but not really sure
>>> how to use it to do what I want to do.
>>>
>>> Regards Erik
>>>
>>> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jdcryans@apache.org
>>> >wrote:
>>>
>>>
>>>
>>>> I think you are looking for this :
>>>>
>>>> // Pattern to determine if a column key is a regex
>>>> static Pattern isRegexPattern =
>>>> Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>>>>
>>>> J-D
>>>>
>>>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> Hi!
>>>>> I'm trying to figure out how to get all the columns in a Map-Reduce job
>>>>> without having to specify
>>>>> them all?
>>>>>
>>>>> Found the line:
>>>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
>>>>>
>>>>>
>>>> name
>>>>
>>>>
>>>>> * wildcards
>>>>>
>>>>> in TableInputFormat.java but didn't find any help over in the
>>>>> HAbScanner.
>>>>>
>>>>> Regards Erik
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
Re: How to get all columns from the scanner in a Map-Reduce job?
Posted by Erik Holstad <er...@gmail.com>.
Hi Stack!
Will try that fix, opened up a Jira-941 in the meantime.
Regards Erik
On Sun, Oct 19, 2008 at 4:05 PM, Michael Stack <st...@duboce.net> wrote:
> What happens if you pass a column name of "^.*$"? Will it return all
> columns? I don't think it will. IIRC the regex can only be applied to the
> column qualifier portion of column name which means you'd have to write out
> a column spec. for your mapreduce job per column family. So, if you had
> three famlies but each had a thousand columns, if you write a column
> specification of "family1:.* family2:.* family3:.*", that should return them
> all.
>
> I took a quick look. It should be the case that an empty string returns
> all columns of a row but currently at least, it'll fail on line #75 in
> TableInputFormat:
>
> if (colArg == null || colArg.length() == 0) {
>
> Try removing the colArg.length(). Maybe it'll work then? (You'll pass in
> an array of columns of zero-length -- I think that'll work).
>
> Meantime, open a JIRA Eric. Seems like a basic expectation, that there be
> a way to get all columns in an MR.
>
> St.Ack
>
>
> Erik Holstad wrote:
>
>> Hey!
>> Yes I did find that line in HAbstractScanner.java but not really sure
>> how to use it to do what I want to do.
>>
>> Regards Erik
>>
>> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>>
>>
>>
>>> I think you are looking for this :
>>>
>>> // Pattern to determine if a column key is a regex
>>> static Pattern isRegexPattern =
>>> Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>>>
>>> J-D
>>>
>>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>>> Hi!
>>>> I'm trying to figure out how to get all the columns in a Map-Reduce job
>>>> without having to specify
>>>> them all?
>>>>
>>>> Found the line:
>>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
>>>>
>>>>
>>> name
>>>
>>>
>>>> * wildcards
>>>>
>>>> in TableInputFormat.java but didn't find any help over in the
>>>> HAbScanner.
>>>>
>>>> Regards Erik
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
Re: How to get all columns from the scanner in a Map-Reduce job?
Posted by Michael Stack <st...@duboce.net>.
What happens if you pass a column name of "^.*$"? Will it return all
columns? I don't think it will. IIRC the regex can only be applied to
the column qualifier portion of column name which means you'd have to
write out a column spec. for your mapreduce job per column family. So,
if you had three famlies but each had a thousand columns, if you write a
column specification of "family1:.* family2:.* family3:.*", that should
return them all.
I took a quick look. It should be the case that an empty string returns
all columns of a row but currently at least, it'll fail on line #75 in
TableInputFormat:
if (colArg == null || colArg.length() == 0) {
Try removing the colArg.length(). Maybe it'll work then? (You'll pass
in an array of columns of zero-length -- I think that'll work).
Meantime, open a JIRA Eric. Seems like a basic expectation, that there
be a way to get all columns in an MR.
St.Ack
Erik Holstad wrote:
> Hey!
> Yes I did find that line in HAbstractScanner.java but not really sure
>
> how to use it to do what I want to do.
>
> Regards Erik
>
> On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>
>> I think you are looking for this :
>>
>> // Pattern to determine if a column key is a regex
>> static Pattern isRegexPattern =
>> Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>>
>> J-D
>>
>> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
>> wrote:
>>
>>
>>> Hi!
>>> I'm trying to figure out how to get all the columns in a Map-Reduce job
>>> without having to specify
>>> them all?
>>>
>>> Found the line:
>>> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
>>>
>> name
>>
>>> * wildcards
>>>
>>> in TableInputFormat.java but didn't find any help over in the HAbScanner.
>>>
>>> Regards Erik
>>>
>>>
>
>
Re: How to get all columns from the scanner in a Map-Reduce job?
Posted by Erik Holstad <er...@gmail.com>.
Hey!
Yes I did find that line in HAbstractScanner.java but not really sure
how to use it to do what I want to do.
Regards Erik
On Sun, Oct 19, 2008 at 7:43 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
> I think you are looking for this :
>
> // Pattern to determine if a column key is a regex
> static Pattern isRegexPattern =
> Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
>
> J-D
>
> On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com>
> wrote:
>
> > Hi!
> > I'm trying to figure out how to get all the columns in a Map-Reduce job
> > without having to specify
> > them all?
> >
> > Found the line:
> > @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column
> name
> > * wildcards
> >
> > in TableInputFormat.java but didn't find any help over in the HAbScanner.
> >
> > Regards Erik
> >
>
Re: How to get all columns from the scanner in a Map-Reduce job?
Posted by Jean-Daniel Cryans <jd...@apache.org>.
I think you are looking for this :
// Pattern to determine if a column key is a regex
static Pattern isRegexPattern =
Pattern.compile("^.*[\\\\+|^&*$\\[\\]\\}{)(]+.*$");
J-D
On Fri, Oct 17, 2008 at 9:39 PM, Erik Holstad <er...@gmail.com> wrote:
> Hi!
> I'm trying to figure out how to get all the columns in a Map-Reduce job
> without having to specify
> them all?
>
> Found the line:
> @see org.apache.hadoop.hbase.regionserver.HAbstractScanner for column name
> * wildcards
>
> in TableInputFormat.java but didn't find any help over in the HAbScanner.
>
> Regards Erik
>