You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by edward yoon <ed...@udanax.org> on 2008/02/09 06:32:23 UTC

HQL plan for Hbase 0.2

I started planning of HQL for Hbase 0.2.
http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL

If you have some advice on this topic, share them with us in this discussion.
Thanks.
-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Doubt in RegExpRowFilter and RowFilters in general

Posted by stack <st...@duboce.net>.
Have you tried enabling DEBUG-level logging?  Filters have lots of 
logging around state changes.  Might help figure this issue.  You might 
need to add extra logging around line #2401 in HStore.

(I just spent some time trying to bend my head around whats going on.  
Filters are run at the Store level.  It looks like that in 
RegExpRowFilter, a map is made on construction of column to value.  If 
value matches, filter returns false, so cell should be added in each 
family.  I don't see anything obviously wrong in here).

St.Ack


David Alves wrote:
> St.Ack
>
> Thanks for your reply.
>
> When I use RegExpRowFilter with only one (either one) of the conditions
> it works (the rows are passed onto the Map/Reduce task) but there is
> still a problem because only one of them column is present in the
> resulting MapWritable (I'm using my own tableinputformat) from the
> scanner.
> So I still use the filter to check for more rows (build a scanner with
> one of the conditions, the rarest one, iterate through to try and find
> the other) but not in the tableinputformat itself (I just discard the
> unwanted values in the Mapper) which is a performance hit (if it would
> be the scanner the row wouldn't simply be sent to the master right,
> therefore less traffic is distributed mode?), but no big deal.
> I seems to me that when the filter is applied only the column that
> matches (or the one that doesn't match I'm not sure at the moment) is
> passed to the scanner result.
>
> As to the second point I'm running HBase in local mode for development
> and the DEBUG log for the HMaster shows nothing, my process simply hangs
> indefinitely.
>
> When I'll have some free time I'll try to look into the sources, and
> pinpoint the problem more accurately.
>
> David
>
> On Mon, 2008-02-11 at 10:36 -0800, stack wrote:
>   
>> David:
>>
>> <disclaimer>IMO, filters are a bit of sweet functionality but they are 
>> not easy to use.  They also have seen little exercise so you are 
>> probably tripping over bugs.  That said, I know they basically 
>> work.</disclaimer>
>>
>> I'd suggest you progress from basic filtering toward the filter you'd 
>> like to implement.   Does the RegExpRowFilter do the right thing when 
>> filtering one column only?
>>
>> On the ClassNotFoundException, yeah, it should be coming out on the 
>> client.  Can you see it in the server logs?  Do you get any exceptions 
>> client-side?
>>
>> St.Ack
>>
>>
>>
>> David Alves wrote:
>>     
>>> Hi Again
>>>
>>> In my previous example I seem to have misplaced a "new" keyword (new
>>> myvalue1.getBytes() where it should have been myvalue1.getBytes()).
>>>
>>> On another note my program hangs when I supply my own filter to the
>>> scanner (I suppose it's clear that the nodes don't know my class so
>>> there should be a ClassNotFoundException right?).
>>>
>>> Regards
>>> David Alves 
>>>
>>>
>>> On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote: 
>>>   
>>>       
>>>> Hi Guys
>>>> 	In my previous email I might have misunderstood the roles of the
>>>> RowFilterInterfaces so I'll pose my question more clearly (since the
>>>> last one wasn't in question form :)).
>>>> 	I save a setup when a table has to columns belonging to different
>>>> column families (Table A cf1:a cf2:b));
>>>>
>>>> I'm trying to build a filter so that a scanner only returns the rows
>>>> where cf1:a = myvalue1 and cf2:b = myvalue2.
>>>>
>>>> I've build a RegExpRowFilter like this;
>>>> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
>>>> 		conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
>>>> 		conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
>>>> 		return new RegExpRowFilter(".*", conditionalsMap);
>>>>
>>>> My problem is this filter always fails when I know for sure that there
>>>> are rows whose columns match my values.
>>>>
>>>> I'm building the the scanner like this (the purpose in this case is to
>>>> find if there are more values that match my filter):
>>>>
>>>> final Text startKey = this.htable.getStartKeys()[0];
>>>> 			HScannerInterface scanner = htable.obtainScanner(new Text[] {new
>>>> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
>>>> 			return scanner.iterator().hasNext();
>>>>
>>>> Can anyone give me a hand please.
>>>>
>>>> Thanks in advance
>>>> David Alves
>>>>
>>>>
>>>>
>>>>     
>>>>         
>>>   
>>>       
>
>   


Re: Doubt in RegExpRowFilter and RowFilters in general

Posted by David Alves <dr...@criticalsoftware.com>.
St.Ack

Thanks for your reply.

When I use RegExpRowFilter with only one (either one) of the conditions
it works (the rows are passed onto the Map/Reduce task) but there is
still a problem because only one of them column is present in the
resulting MapWritable (I'm using my own tableinputformat) from the
scanner.
So I still use the filter to check for more rows (build a scanner with
one of the conditions, the rarest one, iterate through to try and find
the other) but not in the tableinputformat itself (I just discard the
unwanted values in the Mapper) which is a performance hit (if it would
be the scanner the row wouldn't simply be sent to the master right,
therefore less traffic is distributed mode?), but no big deal.
I seems to me that when the filter is applied only the column that
matches (or the one that doesn't match I'm not sure at the moment) is
passed to the scanner result.

As to the second point I'm running HBase in local mode for development
and the DEBUG log for the HMaster shows nothing, my process simply hangs
indefinitely.

When I'll have some free time I'll try to look into the sources, and
pinpoint the problem more accurately.

David

On Mon, 2008-02-11 at 10:36 -0800, stack wrote:
> David:
> 
> <disclaimer>IMO, filters are a bit of sweet functionality but they are 
> not easy to use.  They also have seen little exercise so you are 
> probably tripping over bugs.  That said, I know they basically 
> work.</disclaimer>
> 
> I'd suggest you progress from basic filtering toward the filter you'd 
> like to implement.   Does the RegExpRowFilter do the right thing when 
> filtering one column only?
> 
> On the ClassNotFoundException, yeah, it should be coming out on the 
> client.  Can you see it in the server logs?  Do you get any exceptions 
> client-side?
> 
> St.Ack
> 
> 
> 
> David Alves wrote:
> > Hi Again
> >
> > In my previous example I seem to have misplaced a "new" keyword (new
> > myvalue1.getBytes() where it should have been myvalue1.getBytes()).
> >
> > On another note my program hangs when I supply my own filter to the
> > scanner (I suppose it's clear that the nodes don't know my class so
> > there should be a ClassNotFoundException right?).
> >
> > Regards
> > David Alves 
> >
> >
> > On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote: 
> >   
> >> Hi Guys
> >> 	In my previous email I might have misunderstood the roles of the
> >> RowFilterInterfaces so I'll pose my question more clearly (since the
> >> last one wasn't in question form :)).
> >> 	I save a setup when a table has to columns belonging to different
> >> column families (Table A cf1:a cf2:b));
> >>
> >> I'm trying to build a filter so that a scanner only returns the rows
> >> where cf1:a = myvalue1 and cf2:b = myvalue2.
> >>
> >> I've build a RegExpRowFilter like this;
> >> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
> >> 		conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
> >> 		conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
> >> 		return new RegExpRowFilter(".*", conditionalsMap);
> >>
> >> My problem is this filter always fails when I know for sure that there
> >> are rows whose columns match my values.
> >>
> >> I'm building the the scanner like this (the purpose in this case is to
> >> find if there are more values that match my filter):
> >>
> >> final Text startKey = this.htable.getStartKeys()[0];
> >> 			HScannerInterface scanner = htable.obtainScanner(new Text[] {new
> >> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
> >> 			return scanner.iterator().hasNext();
> >>
> >> Can anyone give me a hand please.
> >>
> >> Thanks in advance
> >> David Alves
> >>
> >>
> >>
> >>     
> >
> >   
> 


Re: Doubt in RegExpRowFilter and RowFilters in general

Posted by stack <st...@duboce.net>.
David:

<disclaimer>IMO, filters are a bit of sweet functionality but they are 
not easy to use.  They also have seen little exercise so you are 
probably tripping over bugs.  That said, I know they basically 
work.</disclaimer>

I'd suggest you progress from basic filtering toward the filter you'd 
like to implement.   Does the RegExpRowFilter do the right thing when 
filtering one column only?

On the ClassNotFoundException, yeah, it should be coming out on the 
client.  Can you see it in the server logs?  Do you get any exceptions 
client-side?

St.Ack



David Alves wrote:
> Hi Again
>
> In my previous example I seem to have misplaced a "new" keyword (new
> myvalue1.getBytes() where it should have been myvalue1.getBytes()).
>
> On another note my program hangs when I supply my own filter to the
> scanner (I suppose it's clear that the nodes don't know my class so
> there should be a ClassNotFoundException right?).
>
> Regards
> David Alves 
>
>
> On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote: 
>   
>> Hi Guys
>> 	In my previous email I might have misunderstood the roles of the
>> RowFilterInterfaces so I'll pose my question more clearly (since the
>> last one wasn't in question form :)).
>> 	I save a setup when a table has to columns belonging to different
>> column families (Table A cf1:a cf2:b));
>>
>> I'm trying to build a filter so that a scanner only returns the rows
>> where cf1:a = myvalue1 and cf2:b = myvalue2.
>>
>> I've build a RegExpRowFilter like this;
>> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
>> 		conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
>> 		conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
>> 		return new RegExpRowFilter(".*", conditionalsMap);
>>
>> My problem is this filter always fails when I know for sure that there
>> are rows whose columns match my values.
>>
>> I'm building the the scanner like this (the purpose in this case is to
>> find if there are more values that match my filter):
>>
>> final Text startKey = this.htable.getStartKeys()[0];
>> 			HScannerInterface scanner = htable.obtainScanner(new Text[] {new
>> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
>> 			return scanner.iterator().hasNext();
>>
>> Can anyone give me a hand please.
>>
>> Thanks in advance
>> David Alves
>>
>>
>>
>>     
>
>   


Re: Doubt in RegExpRowFilter and RowFilters in general

Posted by David Alves <dr...@criticalsoftware.com>.
Hi Again

In my previous example I seem to have misplaced a "new" keyword (new
myvalue1.getBytes() where it should have been myvalue1.getBytes()).

On another note my program hangs when I supply my own filter to the
scanner (I suppose it's clear that the nodes don't know my class so
there should be a ClassNotFoundException right?).

Regards
David Alves 


On Mon, 2008-02-11 at 16:51 +0000, David Alves wrote: 
> Hi Guys
> 	In my previous email I might have misunderstood the roles of the
> RowFilterInterfaces so I'll pose my question more clearly (since the
> last one wasn't in question form :)).
> 	I save a setup when a table has to columns belonging to different
> column families (Table A cf1:a cf2:b));
> 
> I'm trying to build a filter so that a scanner only returns the rows
> where cf1:a = myvalue1 and cf2:b = myvalue2.
> 
> I've build a RegExpRowFilter like this;
> Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
> 		conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
> 		conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
> 		return new RegExpRowFilter(".*", conditionalsMap);
> 
> My problem is this filter always fails when I know for sure that there
> are rows whose columns match my values.
> 
> I'm building the the scanner like this (the purpose in this case is to
> find if there are more values that match my filter):
> 
> final Text startKey = this.htable.getStartKeys()[0];
> 			HScannerInterface scanner = htable.obtainScanner(new Text[] {new
> Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
> 			return scanner.iterator().hasNext();
> 
> Can anyone give me a hand please.
> 
> Thanks in advance
> David Alves
> 
> 
> 


Doubt in RegExpRowFilter and RowFilters in general

Posted by David Alves <dr...@criticalsoftware.com>.
Hi Guys
	In my previous email I might have misunderstood the roles of the
RowFilterInterfaces so I'll pose my question more clearly (since the
last one wasn't in question form :)).
	I save a setup when a table has to columns belonging to different
column families (Table A cf1:a cf2:b));

I'm trying to build a filter so that a scanner only returns the rows
where cf1:a = myvalue1 and cf2:b = myvalue2.

I've build a RegExpRowFilter like this;
Map<Text, byte[]> conditionalsMap = new HashMap<Text, byte[]>();
		conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
		conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
		return new RegExpRowFilter(".*", conditionalsMap);

My problem is this filter always fails when I know for sure that there
are rows whose columns match my values.

I'm building the the scanner like this (the purpose in this case is to
find if there are more values that match my filter):

final Text startKey = this.htable.getStartKeys()[0];
			HScannerInterface scanner = htable.obtainScanner(new Text[] {new
Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
			return scanner.iterator().hasNext();

Can anyone give me a hand please.

Thanks in advance
David Alves




Re: RegExpRowFilter with multiple conditions on rows matching both

Posted by David Alves <dr...@criticalsoftware.com>.
I now realize the text is a bit confusing.. sorry for that.
Also that last paragraph should en with: ... at the same time.

Regards 
David



On Sun, 2008-02-10 at 01:03 +0000, David Alves wrote:
> Hi All!
> 
> First of all congrats for the great piece of software.
> I have a table with two column families (A,B) each with a column when I
> build a RegExpRowFilter to select only rows whose columns A AND B match
> the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered.
> This is strange because if I build the map required by the constructor
> with only one or the other of the conditionals the rows that match won't
> be filtered, meaning that if they pass one and the other conditionals in
> different runs the should pass them both in the same run right?
> 
> More concisely when running with both conditionals they are able to pass
> the filter() method for both columns but fail to pass the
> filterNotNull() method. The debug log tells me that the
> TreeMap<Text,byte[]> passed to filterNotNull() by the HStore scanner
> doesn't contain both columns at the same time (the method is called two
> time first with one column and then with the other).
> 
> Finally when running with only one of conditionals the filterNotNull()
> method still returns true once but returns false the second time
> (therefore returning the record) meaning that not all columns of the
> same row are passing through the cycle.
> 
> Regards
> David Alves
> 
> 
> 
> 
> 
> 


RegExpRowFilter with multiple conditions on rows matching both

Posted by David Alves <dr...@criticalsoftware.com>.
Hi All!

First of all congrats for the great piece of software.
I have a table with two column families (A,B) each with a column when I
build a RegExpRowFilter to select only rows whose columns A AND B match
the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered.
This is strange because if I build the map required by the constructor
with only one or the other of the conditionals the rows that match won't
be filtered, meaning that if they pass one and the other conditionals in
different runs the should pass them both in the same run right?

More concisely when running with both conditionals they are able to pass
the filter() method for both columns but fail to pass the
filterNotNull() method. The debug log tells me that the
TreeMap<Text,byte[]> passed to filterNotNull() by the HStore scanner
doesn't contain both columns at the same time (the method is called two
time first with one column and then with the other).

Finally when running with only one of conditionals the filterNotNull()
method still returns true once but returns false the second time
(therefore returning the record) meaning that not all columns of the
same row are passing through the cycle.

Regards
David Alves







Re: HQL plan for Hbase 0.2

Posted by Bryan Duxbury <br...@rapleaf.com>.
Edward,

I don't really understand what this example is supposed to be  
suggesting. Can you try to add some explanation to your code?

-Bryan

On Feb 9, 2008, at 7:40 PM, edward yoon wrote:

> Thanks for the reviews.
> What do you think about below example?
> Can it be possible?
>
>   map {
>     //row is server host name
>     rs = hql.executeQuery("select filePath: from server_data where
> row='" + row + "'"
>          + "and column='" + columnfamily + ":" + qualifier + "';");
>
>     hql.executeQuery("load data file '" + rs.result() + "' into " +
> resultTable + ";");
>   }
>
>   main(String[] args) {
>     columnfamily = "webserver";           /* or "hadoop" */
>     qualifier = "access_log";             /* or "task_tracker_log" */
>     resultTable = "access_log_table";     /* or  
> "task_tracker_log_table" */
>   }
>
>
> On 2/10/08, Bryan Duxbury <br...@rapleaf.com> wrote:
>> I added a few comments.
>> -Bryan
>>
>> On Feb 9, 2008, at 1:39 PM, stack wrote:
>>
>>> I added comments Edward (Thanks for adding link to your hql plans
>>> from 0.17 plans).
>>> St.Ack
>>>
>>> edward yoon wrote:
>>>> I started planning of HQL for Hbase 0.2.
>>>> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
>>>>
>>>> If you have some advice on this topic, share them with us in this
>>>> discussion.
>>>> Thanks.
>>>>
>>>
>>
>>
>
>
> -- 
> B. Regards,
> Edward yoon @ NHN, corp.


Re: HQL plan for Hbase 0.2

Posted by edward yoon <ed...@udanax.org>.
Thanks for the reviews.
What do you think about below example?
Can it be possible?

  map {
    //row is server host name
    rs = hql.executeQuery("select filePath: from server_data where
row='" + row + "'"
         + "and column='" + columnfamily + ":" + qualifier + "';");

    hql.executeQuery("load data file '" + rs.result() + "' into " +
resultTable + ";");
  }

  main(String[] args) {
    columnfamily = "webserver";           /* or "hadoop" */
    qualifier = "access_log";             /* or "task_tracker_log" */
    resultTable = "access_log_table";     /* or "task_tracker_log_table" */
  }


On 2/10/08, Bryan Duxbury <br...@rapleaf.com> wrote:
> I added a few comments.
> -Bryan
>
> On Feb 9, 2008, at 1:39 PM, stack wrote:
>
> > I added comments Edward (Thanks for adding link to your hql plans
> > from 0.17 plans).
> > St.Ack
> >
> > edward yoon wrote:
> >> I started planning of HQL for Hbase 0.2.
> >> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
> >>
> >> If you have some advice on this topic, share them with us in this
> >> discussion.
> >> Thanks.
> >>
> >
>
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: HQL plan for Hbase 0.2

Posted by Bryan Duxbury <br...@rapleaf.com>.
I added a few comments.
-Bryan

On Feb 9, 2008, at 1:39 PM, stack wrote:

> I added comments Edward (Thanks for adding link to your hql plans  
> from 0.17 plans).
> St.Ack
>
> edward yoon wrote:
>> I started planning of HQL for Hbase 0.2.
>> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
>>
>> If you have some advice on this topic, share them with us in this  
>> discussion.
>> Thanks.
>>
>


Re: HQL plan for Hbase 0.2

Posted by stack <st...@duboce.net>.
I added comments Edward (Thanks for adding link to your hql plans from 
0.17 plans).
St.Ack

edward yoon wrote:
> I started planning of HQL for Hbase 0.2.
> http://wiki.apache.org/hadoop/Hbase/HbaseShell/HQL
>
> If you have some advice on this topic, share them with us in this discussion.
> Thanks.
>   


RegExpRowFilter with multiple conditions on rows matching both

Posted by David Alves <da...@gmail.com>.
Hi All!

First of all congrats for the great piece of software.
I have a table with two column families (A,B) each with a column when I
build a RegExpRowFilter to select only rows whose columns A AND B match
the criteria (lets say A:a = 1 and B:b = 2) all the rows are filtered.
This is strange because if I build the map required by the constructor
with only one or the other of the conditionals the rows that match won't
be filtered, meaning that if they pass one and the other conditionals in
different runs the should pass them both in the same run right?

More concisely when running with both conditionals they are able to pass
the filter() method for both columns but fail to pass the
filterNotNull() method. The debug log tells me that the
TreeMap<Text,byte[]> passed to filterNotNull() by the HStore scanner
doesn't contain both columns at the same time (the method is called two
time first with one column and then with the other).

Finally when running with only one of conditionals the filterNotNull()
method still returns true once but returns false the second time
(therefore returning the record) meaning that not all columns of the
same row are passing through the cycle.

Regards
David Alves