You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by edward yoon <ed...@udanax.org> on 2008/02/09 06:32:23 UTC
HQL plan for Hbase 0.2
I started planning of HQL for Hbase 0.2.
http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
If you have some advice on this topic, share them with us in this discussion.
Thanks.
--
B. Regards,
Edward yoon @ NHN, corp.
Re: Doubt in RegExpRowFilter and RowFilters in general
Posted by stack <st...@duboce.net>.
Have you tried enabling DEBUG-level logging? Filters have lots of
logging around state changes. Might help figure this issue. You might
need to add extra logging around line #2401 in HStore.
(I just spent some time trying to bend my head around whats going on.
Filters are run at the Store level. It looks like that in
RegExpRowFilter, a map is made on construction of column to value. If
value matches, filter returns false, so cell should be added in each
family. I don't see anything obviously wrong in here).
St.Ack
David Alves wrote:
> St.Ack
>
> Thanks for your reply.
>
> When I use RegExpRowFilter with only one (either one) of the conditions
> it works (the rows are passed onto the Map/Reduce task) but there is
> still a problem because only one of them column is present in the
> resulting MapWritable (I'm using my own tableinputformat) from the
> scanner.
> So I still use the filter to check for more rows (build a scanner with
> one of the conditions, the rarest one, iterate through to try and find
> the other) but not in the tableinputformat itself (I just discard the
> unwanted values in the Mapper) which is a performance hit (if it would
> be the scanner the row wouldn't simply be sent to the master right,
> therefore less traffic is distributed mode?), but no big deal.
> I seems to me that when the filter is applied only the column that
> matches (or the one that doesn't match I'm not sure at the moment) is
> passed to the scanner result.
>
> As to the second point I'm running HBase in local mode for development
> and the DEBUG log for the HMaster shows nothing, my process simply hangs
> indefinitely.
>
> When I'll have some free time I'll try to look into the sources, and
> pinpoint the problem more accurately.
>
> David
>
> On Mon, 2008-02-11 at 10:36 -0800, stack wrote:
>
>> David:
>>
>> <disclaimer>IMO, filters are a bit of sweet functionality but they are
>> not easy to use. They also have seen little exercise so you are
>> probably tripping over bugs. That said, I know they basically
>> work.</disclaimer>
>>
>> I'd suggest you progress from basic filtering toward the filter you'd
>> like to implement. Does the RegExpRowFilter do the right thing when
>> filtering one column only?
>>
>> On the ClassNotFoundException, yeah, it should be coming out on the
>> client. Can you see it in the server logs? Do you get any exceptions
>> client-side?
>>
>> St.Ack
>>
>>
>>
>> David Alves wrote:
>>
>>> Hi Again
>>>
>>> In my previous example I seem to have misplaced a "new" keyword (new
>>> myvalue1.getBytes() where it should have been myvalue1.getBytes()).
>>>
>>> On another note my program hangs when I supply my own filter to the
>>> scanner (I suppose it's clear that the nodes don't know my class so
>>> there should be a ClassNotFoundException right?).
>>>
>>> Regards
>>> David Alves
>>>
>>>
>>> On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote:
>>>
>>>
>>>> Hi Guys
>>>> In my previous email I might have misunderstood the roles of the
>>>> RowFilterInterfaces so I'll pose my question more clearly (since the
>>>> last one wasn't in question form :)).
>>>> I save a setup when a table has to columns belonging to different
>>>> column families (Table A cf1:a cf2:b));
>>>>
>>>> I'm trying to build a filter so that a scanner only returns the rows
>>>> where cf1:a = myvalue1 and cf2:b = myvalue2.
>>>>
>>>> I've build a RegExpRowFilter like this;
>>>> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
>>>> conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
>>>> conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
>>>> return new RegExpRowFilter(".*", conditionalsMap);
>>>>
>>>> My problem is this filter always fails when I know for sure that there
>>>> are rows whose columns match my values.
>>>>
>>>> I'm building the the scanner like this (the purpose in this case is to
>>>> find if there are more values that match my filter):
>>>>
>>>> final Text startKey = this.htable.getStartKeys()[0];
>>>> HScannerInterface scanner = htable.obtainScanner(new Text[] {new
>>>> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
>>>> return scanner.iterator().hasNext();
>>>>
>>>> Can anyone give me a hand please.
>>>>
>>>> Thanks in advance
>>>> David Alves
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>
Re: Doubt in RegExpRowFilter and RowFilters in general
Posted by David Alves <dr...@criticalsoftware.com>.
St.Ack
Thanks for your reply.
When I use RegExpRowFilter with only one (either one) of the conditions
it works (the rows are passed onto the Map/Reduce task) but there is
still a problem because only one of them column is present in the
resulting MapWritable (I'm using my own tableinputformat) from the
scanner.
So I still use the filter to check for more rows (build a scanner with
one of the conditions, the rarest one, iterate through to try and find
the other) but not in the tableinputformat itself (I just discard the
unwanted values in the Mapper) which is a performance hit (if it would
be the scanner the row wouldn't simply be sent to the master right,
therefore less traffic is distributed mode?), but no big deal.
I seems to me that when the filter is applied only the column that
matches (or the one that doesn't match I'm not sure at the moment) is
passed to the scanner result.
As to the second point I'm running HBase in local mode for development
and the DEBUG log for the HMaster shows nothing, my process simply hangs
indefinitely.
When I'll have some free time I'll try to look into the sources, and
pinpoint the problem more accurately.
David
On Mon, 2008-02-11 at 10:36 -0800, stack wrote:
> David:
>
> <disclaimer>IMO, filters are a bit of sweet functionality but they are
> not easy to use. They also have seen little exercise so you are
> probably tripping over bugs. That said, I know they basically
> work.</disclaimer>
>
> I'd suggest you progress from basic filtering toward the filter you'd
> like to implement. Does the RegExpRowFilter do the right thing when
> filtering one column only?
>
> On the ClassNotFoundException, yeah, it should be coming out on the
> client. Can you see it in the server logs? Do you get any exceptions
> client-side?
>
> St.Ack
>
>
>
> David Alves wrote:
> > Hi Again
> >
> > In my previous example I seem to have misplaced a "new" keyword (new
> > myvalue1.getBytes() where it should have been myvalue1.getBytes()).
> >
> > On another note my program hangs when I supply my own filter to the
> > scanner (I suppose it's clear that the nodes don't know my class so
> > there should be a ClassNotFoundException right?).
> >
> > Regards
> > David Alves
> >
> >
> > On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote:
> >
> >> Hi Guys
> >> In my previous email I might have misunderstood the roles of the
> >> RowFilterInterfaces so I'll pose my question more clearly (since the
> >> last one wasn't in question form :)).
> >> I save a setup when a table has to columns belonging to different
> >> column families (Table A cf1:a cf2:b));
> >>
> >> I'm trying to build a filter so that a scanner only returns the rows
> >> where cf1:a = myvalue1 and cf2:b = myvalue2.
> >>
> >> I've build a RegExpRowFilter like this;
> >> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
> >> conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
> >> conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
> >> return new RegExpRowFilter(".*", conditionalsMap);
> >>
> >> My problem is this filter always fails when I know for sure that there
> >> are rows whose columns match my values.
> >>
> >> I'm building the the scanner like this (the purpose in this case is to
> >> find if there are more values that match my filter):
> >>
> >> final Text startKey = this.htable.getStartKeys()[0];
> >> HScannerInterface scanner = htable.obtainScanner(new Text[] {new
> >> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
> >> return scanner.iterator().hasNext();
> >>
> >> Can anyone give me a hand please.
> >>
> >> Thanks in advance
> >> David Alves
> >>
> >>
> >>
> >>
> >
> >
>
Re: Doubt in RegExpRowFilter and RowFilters in general
Posted by stack <st...@duboce.net>.
David:
<disclaimer>IMO, filters are a bit of sweet functionality but they are
not easy to use. They also have seen little exercise so you are
probably tripping over bugs. That said, I know they basically
work.</disclaimer>
I'd suggest you progress from basic filtering toward the filter you'd
like to implement. Does the RegExpRowFilter do the right thing when
filtering one column only?
On the ClassNotFoundException, yeah, it should be coming out on the
client. Can you see it in the server logs? Do you get any exceptions
client-side?
St.Ack
David Alves wrote:
> Hi Again
>
> In my previous example I seem to have misplaced a "new" keyword (new
> myvalue1.getBytes() where it should have been myvalue1.getBytes()).
>
> On another note my program hangs when I supply my own filter to the
> scanner (I suppose it's clear that the nodes don't know my class so
> there should be a ClassNotFoundException right?).
>
> Regards
> David Alves
>
>
> On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote:
>
>> Hi Guys
>> In my previous email I might have misunderstood the roles of the
>> RowFilterInterfaces so I'll pose my question more clearly (since the
>> last one wasn't in question form :)).
>> I save a setup when a table has to columns belonging to different
>> column families (Table A cf1:a cf2:b));
>>
>> I'm trying to build a filter so that a scanner only returns the rows
>> where cf1:a = myvalue1 and cf2:b = myvalue2.
>>
>> I've build a RegExpRowFilter like this;
>> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
>> conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
>> conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
>> return new RegExpRowFilter(".*", conditionalsMap);
>>
>> My problem is this filter always fails when I know for sure that there
>> are rows whose columns match my values.
>>
>> I'm building the the scanner like this (the purpose in this case is to
>> find if there are more values that match my filter):
>>
>> final Text startKey = this.htable.getStartKeys()[0];
>> HScannerInterface scanner = htable.obtainScanner(new Text[] {new
>> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
>> return scanner.iterator().hasNext();
>>
>> Can anyone give me a hand please.
>>
>> Thanks in advance
>> David Alves
>>
>>
>>
>>
>
>
Re: Doubt in RegExpRowFilter and RowFilters in general
Posted by David Alves <dr...@criticalsoftware.com>.
Hi Again
In my previous example I seem to have misplaced a "new" keyword (new
myvalue1.getBytes() where it should have been myvalue1.getBytes()).
On another note my program hangs when I supply my own filter to the
scanner (I suppose it's clear that the nodes don't know my class so
there should be a ClassNotFoundException right?).
Regards
David Alves
On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote:
> Hi Guys
> In my previous email I might have misunderstood the roles of the
> RowFilterInterfaces so I'll pose my question more clearly (since the
> last one wasn't in question form :)).
> I save a setup when a table has to columns belonging to different
> column families (Table A cf1:a cf2:b));
>
> I'm trying to build a filter so that a scanner only returns the rows
> where cf1:a = myvalue1 and cf2:b = myvalue2.
>
> I've build a RegExpRowFilter like this;
> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
> conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
> conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
> return new RegExpRowFilter(".*", conditionalsMap);
>
> My problem is this filter always fails when I know for sure that there
> are rows whose columns match my values.
>
> I'm building the the scanner like this (the purpose in this case is to
> find if there are more values that match my filter):
>
> final Text startKey = this.htable.getStartKeys()[0];
> HScannerInterface scanner = htable.obtainScanner(new Text[] {new
> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
> return scanner.iterator().hasNext();
>
> Can anyone give me a hand please.
>
> Thanks in advance
> David Alves
>
>
>
Doubt in RegExpRowFilter and RowFilters in general
Posted by David Alves <dr...@criticalsoftware.com>.
Hi Guys
In my previous email I might have misunderstood the roles of the
RowFilterInterfaces so I'll pose my question more clearly (since the
last one wasn't in question form :)).
I save a setup when a table has to columns belonging to different
column families (Table A cf1:a cf2:b));
I'm trying to build a filter so that a scanner only returns the rows
where cf1:a = myvalue1 and cf2:b = myvalue2.
I've build a RegExpRowFilter like this;
Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
return new RegExpRowFilter(".*", conditionalsMap);
My problem is this filter always fails when I know for sure that there
are rows whose columns match my values.
I'm building the the scanner like this (the purpose in this case is to
find if there are more values that match my filter):
final Text startKey = this.htable.getStartKeys()[0];
HScannerInterface scanner = htable.obtainScanner(new Text[] {new
Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
return scanner.iterator().hasNext();
Can anyone give me a hand please.
Thanks in advance
David Alves
Re: RegExpRowFilter with multiple conditions on rows matching both
Posted by David Alves <dr...@criticalsoftware.com>.
I now realize the text is a bit confusing.. sorry for that.
Also that last paragraph should en with: ... at the same time.
Regards
David
On Sun, 2008-02-10 at 01:03 +0000, David Alves wrote:
> Hi All!
>
> First of all congrats for the great piece of software.
> I have a table with two column families (A,B) each with a column when I
> build a RegExpRowFilter to select only rows whose columns A AND B match
> the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered.
> This is strange because if I build the map required by the constructor
> with only one or the other of the conditionals the rows that match won't
> be filtered, meaning that if they pass one and the other conditionals in
> different runs the should pass them both in the same run right?
>
> More concisely when running with both conditionals they are able to pass
> the filter() method for both columns but fail to pass the
> filterNotNull() method. The debug log tells me that the
> TreeMap<Text,byte[]> passed to filterNotNull() by the HStore scanner
> doesn't contain both columns at the same time (the method is called two
> time first with one column and then with the other).
>
> Finally when running with only one of conditionals the filterNotNull()
> method still returns true once but returns false the second time
> (therefore returning the record) meaning that not all columns of the
> same row are passing through the cycle.
>
> Regards
> David Alves
>
>
>
>
>
>
RegExpRowFilter with multiple conditions on rows matching both
Posted by David Alves <dr...@criticalsoftware.com>.
Hi All!
First of all congrats for the great piece of software.
I have a table with two column families (A,B) each with a column when I
build a RegExpRowFilter to select only rows whose columns A AND B match
the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered.
This is strange because if I build the map required by the constructor
with only one or the other of the conditionals the rows that match won't
be filtered, meaning that if they pass one and the other conditionals in
different runs the should pass them both in the same run right?
More concisely when running with both conditionals they are able to pass
the filter() method for both columns but fail to pass the
filterNotNull() method. The debug log tells me that the
TreeMap<Text,byte[]> passed to filterNotNull() by the HStore scanner
doesn't contain both columns at the same time (the method is called two
time first with one column and then with the other).
Finally when running with only one of conditionals the filterNotNull()
method still returns true once but returns false the second time
(therefore returning the record) meaning that not all columns of the
same row are passing through the cycle.
Regards
David Alves
Re: HQL plan for Hbase 0.2
Posted by Bryan Duxbury <br...@rapleaf.com>.
Edward,
I don't really understand what this example is supposed to be
suggesting. Can you try to add some explanation to your code?
-Bryan
On Feb 9, 2008, at 7:40 PM, edward yoon wrote:
> Thanks for the reviews.
> What do you think about below example?
> Can it be possible?
>
> map {
> //row is server host name
> rs = hql.executeQuery("select filePath: from server_data where
> row='" + row + "'"
> + "and column='" + columnfamily + ":" + qualifier + "';");
>
> hql.executeQuery("load data file '" + rs.result() + "' into " +
> resultTable + ";");
> }
>
> main(String[] args) {
> columnfamily = "webserver"; /* or "hadoop" */
> qualifier = "access_log"; /* or "task_tracker_log" */
> resultTable = "access_log_table"; /* or
> "task_tracker_log_table" */
> }
>
>
> On 2/10/08, Bryan Duxbury <br...@rapleaf.com> wrote:
>> I added a few comments.
>> -Bryan
>>
>> On Feb 9, 2008, at 1:39 PM, stack wrote:
>>
>>> I added comments Edward (Thanks for adding link to your hql plans
>>> from 0.17 plans).
>>> St.Ack
>>>
>>> edward yoon wrote:
>>>> I started planning of HQL for Hbase 0.2.
>>>> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
>>>>
>>>> If you have some advice on this topic, share them with us in this
>>>> discussion.
>>>> Thanks.
>>>>
>>>
>>
>>
>
>
> --
> B. Regards,
> Edward yoon @ NHN, corp.
Re: HQL plan for Hbase 0.2
Posted by edward yoon <ed...@udanax.org>.
Thanks for the reviews.
What do you think about below example?
Can it be possible?
map {
//row is server host name
rs = hql.executeQuery("select filePath: from server_data where
row='" + row + "'"
+ "and column='" + columnfamily + ":" + qualifier + "';");
hql.executeQuery("load data file '" + rs.result() + "' into " +
resultTable + ";");
}
main(String[] args) {
columnfamily = "webserver"; /* or "hadoop" */
qualifier = "access_log"; /* or "task_tracker_log" */
resultTable = "access_log_table"; /* or "task_tracker_log_table" */
}
On 2/10/08, Bryan Duxbury <br...@rapleaf.com> wrote:
> I added a few comments.
> -Bryan
>
> On Feb 9, 2008, at 1:39 PM, stack wrote:
>
> > I added comments Edward (Thanks for adding link to your hql plans
> > from 0.17 plans).
> > St.Ack
> >
> > edward yoon wrote:
> >> I started planning of HQL for Hbase 0.2.
> >> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
> >>
> >> If you have some advice on this topic, share them with us in this
> >> discussion.
> >> Thanks.
> >>
> >
>
>
--
B. Regards,
Edward yoon @ NHN, corp.
Re: HQL plan for Hbase 0.2
Posted by Bryan Duxbury <br...@rapleaf.com>.
I added a few comments.
-Bryan
On Feb 9, 2008, at 1:39 PM, stack wrote:
> I added comments Edward (Thanks for adding link to your hql plans
> from 0.17 plans).
> St.Ack
>
> edward yoon wrote:
>> I started planning of HQL for Hbase 0.2.
>> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
>>
>> If you have some advice on this topic, share them with us in this
>> discussion.
>> Thanks.
>>
>
Re: HQL plan for Hbase 0.2
Posted by stack <st...@duboce.net>.
I added comments Edward (Thanks for adding link to your hql plans from
0.17 plans).
St.Ack
edward yoon wrote:
> I started planning of HQL for Hbase 0.2.
> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
>
> If you have some advice on this topic, share them with us in this discussion.
> Thanks.
>
RegExpRowFilter with multiple conditions on rows matching both
Posted by David Alves <da...@gmail.com>.
Hi All!
First of all congrats for the great piece of software.
I have a table with two column families (A,B) each with a column when I
build a RegExpRowFilter to select only rows whose columns A AND B match
the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered.
This is strange because if I build the map required by the constructor
with only one or the other of the conditionals the rows that match won't
be filtered, meaning that if they pass one and the other conditionals in
different runs the should pass them both in the same run right?
More concisely when running with both conditionals they are able to pass
the filter() method for both columns but fail to pass the
filterNotNull() method. The debug log tells me that the
TreeMap<Text,byte[]> passed to filterNotNull() by the HStore scanner
doesn't contain both columns at the same time (the method is called two
time first with one column and then with the other).
Finally when running with only one of conditionals the filterNotNull()
method still returns true once but returns false the second time
(therefore returning the record) meaning that not all columns of the
same row are passing through the cycle.
Regards
David Alves