You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2008/11/17 20:27:44 UTC

[jira] Commented: (HBASE-1002) Small query language for filters

    [ https://issues.apache.org/jira/browse/HBASE-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648269#action_12648269 ] 

Andrew Purtell commented on HBASE-1002:
---------------------------------------

Here is a conversation about this on IRC:

(10:41:16 AM) ffgeek200: are there any thoughts regarding a future query languages for row filters?
(10:42:43 AM) st^Ack_: ffgeek200: how do you mean?
(10:44:06 AM) ffgeek200: st^Ack_: ie give me all rows where "(int(col("entry:price")) > 3 && col("entry:name")=="ABC" || col("entry:name")="XYZ"
(10:45:28 AM) apurtell: ffgeek200: filter spec -> little language compiler -> specialized bytecode -> execution on regionserver during scanner traversal ?
(10:46:11 AM) ffgeek200: apurtell: yes
(10:46:56 AM) apurtell: ffgeek200: what about filter spec -> little language compiler -> code to build existing (maybe modified a little) filter class heirarchies -> send to regionserver in the current manner?
(10:49:30 AM) ffgeek200: apurtell: it could be implemented in many ways yes. That is another way. What about something crazy like writing java code that implements a RowFilterInterface method "boolean isFiltered(Row row)", then serialize that class over the network... let Java deal with compilation since it does that well.
(10:50:01 AM) st^Ack_: or apurtell, how about a jruby filter? You pass it jruby code, and it runs it on every row?
(10:50:51 AM) ffgeek200: jruby would work. I remember reading about a similar database and they used server-side javascript for this purpose.
(10:52:16 AM) apurtell: stack,ffgeek200: jruby snippit is good. was going to reply that java serialization only works if the classes are available at each endpoint (java serialization does not ship code afaik).
(10:54:37 AM) ffgeek200: apurtell: true. I think that would be cleaner than how it is currently done, trying to munge your row filter to do what you want.
(10:55:12 AM) st^Ack_: apurtell: yes that the classes would have to be on CLASSPATH on either end of the serialization. jruby script would be better (this jruby suggestion is just your filter spec -> little language compiler -> etc. suggestion generalized)
(10:56:09 AM) apurtell: ffgeek200,stack: downside to jruby snippet is it is an untrusted code upload to regionserver. that's why i suggested using existing classes, which cause only restricted/controlled actions to happen in the regionserver. on the other hand jruby snippets can be managed when access control is added in a manner similar to how rdbms controls stored procedures.
(10:56:52 AM) st^Ack_: apurtell: you are right
(10:57:08 AM) st^Ack_: very hard preventing jruby snippet running riot
(10:57:15 AM) apurtell: stack: indeed
(10:58:13 AM) ffgeek200: apurtell: postgres allows for sprocs to be in pretty much all popular languages, but I'm not sure if it restricts the sprocs.
(11:00:53 AM) ffgeek200: example of how they do it with ruby: http://www.april-child.com/blog/2007/05/10/running-ruby-in-postgresql-on-mac-os-x/
(11:01:42 AM) apurtell: ffeek200: stored procedure access control is rwx by user plus setuid typically, to use a fs metaphor.
(11:02:02 AM) apurtell: ffgeek200: at least that was what i was referring to.
(11:03:11 AM) ffgeek200: apurtell: I think it would definitely open a can of worms security-wise. For me it's fine since I'm in control of everything over here, but others may want restrictions on its usage, maybe they would choose to not compile it in.
(11:04:25 AM) ffgeek200: no matter what security restrictions you impose, they can of course always sit in a while loop and burn CPU.
(11:05:33 AM) jgray: ffgeek200: postgres has safe and unsafe integration with other languages for stored procedures
(11:05:43 AM) apurtell: it does seem to me that a little language compiler that builds hierarchies of filters in the current form is a desirable feature. can be some kind of contrib. common query operators can be supported, and the class implementation server side maintains safety. and anything the "compiler" might do can be constructed by hand as desired (no api changes).
(11:10:34 AM) ffgeek200: jgray: thanks I forgot about that. apurtell: sounds awesome. I'm biased re: postgres since I think it does a good job of this. What if that little language compiler was done for now, calling it something like hbaseql then later on other languages could be implemented, but the default one is hbaseql.


> Small query language for filters
> --------------------------------
>
>                 Key: HBASE-1002
>                 URL: https://issues.apache.org/jira/browse/HBASE-1002
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: filters
>            Reporter: Andrew Purtell
>            Priority: Minor
>
> Improve the usability of filters by making them specifiable or executable using a little query language. 
> For example:
>     col("entry:price") > 3 && (col("entry:name") = "ABC" || col("entry:name") = "XYZ")
> Can be implemented as a little language compiler that takes filter specifications as input and builds the requisite hierarchy of filter API classes and actions as emitted java code. 
> Can also be implemented using JRuby snippets sent to the regionserver for execution, but this has troublesome security implications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.