You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by Kasper Sørensen <i....@gmail.com> on 2014/03/24 22:34:25 UTC

Re: [DISCUSS] State of the work-in-progress HBase branch

A quick update on this since the module has now been merged into the master
branch:

1) Module is still read-only. This is accepted for now (unless someone
wants to help change it of course).

2) Metadata mapping is still working in two modes: a) we discover the
column families and expose them as byte-array maps (not very useful, but
works as a "lowest common denominator") and b) the user provides a set of
SimpleTableDef (which now has a convenient parser btw.:)) and gets his
table mapping as he wants it.

3) Querying now has special support for lookup-by-id type queries where we
will use HBase Get instead of Scan. We also have good support for
LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will scan
past the first records on the client side).

4) Dependencies seems to be a pain still. HBase and Hadoop comes in many
flavours and all are not compatible. I doubt there's a lot we can do about
it, except ask the users to provide their own HBase dependency as per their
backend version. We should probably thus make all our HBase/Hadoop
dependencies <optional>true</optional> in order to not influence the
typical clients.

Kasper


2014-02-24 17:08 GMT+01:00 Kasper Sørensen <i....@gmail.com>:

> Hi Henry,
>
> Yea the Phoenix project is definately an interesting approach to making MM
> capable of working with HBase. The only downside to me is that it seems
> they do a lot of intrusive stuff to HBase like creating new index tables
> etc... I would normally not "allow" that for a simple connector.
>
> Maybe we should simply support both styles. And in the case of Phoenix, I
> guess we could simply go through the JDBC module of MetaModel and connect
> via their JDBC driver... Is that maybe a route, do you know?
>
> - Kasper
>
>
> 2014-02-24 6:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>
> We could use the HBase client library from the store I suppose.
>> The issue I am actually worry is actually adding real query support
>> for column based datastore is kind of big task.
>> Apache Phoenix tried to do that so maybe we could leverage the SQL
>> planner layer to provide the implementation of the query execution to
>> HBase layer?
>>
>> - Henry
>>
>>
>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
>> <i....@gmail.com> wrote:
>> > Thanks for the input Henry. With your experience, do you then also
>> happen
>> > to know of a good thin client-side library? I imagine that we could
>> maybe
>> > use a REST client instead of the full client we currently use. That
>> would
>> > save us a ton of dependency-overhead I think. Or is it a non-issue in
>> your
>> > mind, since HBase users are used to this overhead?
>> >
>> >
>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >
>> >> For 1 > I think adding read only to HBase should be ok because most
>> >> update to HBase either through HBase client or REST via Stargate [1]
>> >> or Thrift
>> >>
>> >> For 2 > In Apache Gora we use Avro to do type mapping to column and
>> >> generate POJO java via Avro compiler.
>> >>
>> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating try
>> >> to provide SQL to HBase [2] via extra indexing and caching. I think
>> >> this is defeat the purpose of having NoSQL databases that serve
>> >> different purpose than Relational databse.
>> >>
>> >> I am not sure Metamodel should touch NoSQL databases which more like
>> >> column types. These databases are designed for large data with access
>> >> primary via key and not query mechanism.
>> >>
>> >> Just my 2-cent
>> >>
>> >>
>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
>> >> [2] http://phoenix.incubator.apache.org/
>> >>
>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
>> >> <i....@gmail.com> wrote:
>> >> > Hi everyone,
>> >> >
>> >> > I was looking at our "hbase-module" branch and as much as I like this
>> >> idea,
>> >> > I think we've been a bit too idle with the branch. Maybe we should
>> try to
>> >> > make something final e.g. for a version 4.1.
>> >> >
>> >> > So I thought to give an overview/status of the module's current
>> >> > capabilities and it's shortcomings. We should figure out if we think
>> this
>> >> > is good enough for a first version, or if we want to do some
>> improvements
>> >> > to the module before adding it to our portfolio of MetaModel modules.
>> >> >
>> >> > 1) The module only offers read-only/query access to HBase. That is
>> in my
>> >> > opinion OK for now, we have several such modules, and this is
>> something
>> >> we
>> >> > can better add later if we straighten out the remaining topics in
>> this
>> >> mail.
>> >> >
>> >> > 2) With regards to metadata mapping: HBase is different because it
>> has
>> >> both
>> >> > column families and in column families there are columns. For the
>> sake of
>> >> > our view on HBase I would describe column families simply as "a
>> logical
>> >> of
>> >> > columns". Column families are fixed within a table, but rows in a
>> table
>> >> may
>> >> > contain arbitrary numbers of columns within each column family.
>> So... You
>> >> > can instantiate the HBaseDataContext in two ways:
>> >> >
>> >> > 2a) You can let MetaModel discover the metadata. This unfortunately
>> has a
>> >> > severe limitation. We discover the table names and column families
>> using
>> >> > the HBase API. But the actual columns and their contents cannot be
>> >> provided
>> >> > by the API. So instead we simply expose the column families with a
>> MAP
>> >> data
>> >> > types. The trouble with this is that the keys and values of the maps
>> will
>> >> > simply be byte-arrays ... Usually not very useful! But it's sort of
>> the
>> >> > only thing (as far as I can see) that's "safe" in HBase, since HBase
>> >> allows
>> >> > anything (byte arrays) in it's columns.
>> >> >
>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array
>> of
>> >> > tables (SimpleTableDef). That way the user defines the metadata
>> himself
>> >> and
>> >> > the implementation assumes that it is correct (or else it will
>> break).
>> >> The
>> >> > good thing about this is that the user can define the proper data
>> types
>> >> > etc. for columns. The user defines the column family and column name
>> by
>> >> > setting defining the MetaModel column name as this: "family:name"
>> >> > (consistent with most HBase tools and API calls).
>> >> >
>> >> > 3) With regards to querying: We've implemented basic query
>> capabilities
>> >> > using the MetaModel query postprocessor. But not all queries are very
>> >> > effective... In addition to of course full table scans, we have
>> optimized
>> >> > support of of COUNT queries and of table scans with maxRows.
>> >> >
>> >> > We could rather easily add optimized support for a couple of other
>> >> typical
>> >> > queries:
>> >> >  * lookup record by ID
>> >> >  * paged table scans (both firstRow and maxRows)
>> >> >  * queries with simple filters/where items
>> >> >
>> >> > 4) With regards to dependencies: The module right now depends on the
>> >> > artifact called "hbase-client". This dependency has a loot of
>> transient
>> >> > dependencies so the size of the module is quite extreme. As an
>> example,
>> >> it
>> >> > includes stuff like jetty, jersey, jackson and of course hadoop...
>> But I
>> >> am
>> >> > wondering if we can have a more thin client-side than that! If anyone
>> >> knows
>> >> > if e.g. we can use the REST interface easily or so, that would maybe
>> be
>> >> > better. I'm not an expert on HBase though, so please enlighten me!
>> >> >
>> >> > Kind regards,
>> >> > Kasper
>> >>
>>
>
>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Kasper Sørensen <i....@gmail.com>.
Wondering what other people think here ... And if we go for a documentation
site that is "built" and released, how do we then bootstrap it easily with
the knowledge that is currently in the wiki?


2014-03-25 20:35 GMT+01:00 Henry Saputra <he...@gmail.com>:

> Some projects do link back from homepage to wiki page. I think the
> main key is to have separate docs for each release.
>
> What do you think?
>
> - Henry
>
> On Tue, Mar 25, 2014 at 4:47 AM, Kasper Sørensen
> <i....@gmail.com> wrote:
> > Hmm was kinda hoping we wouldn't have to... But that's just because I am
> > lazy and I prefer "live" (editable online) documentation where possible
> > (that way you can easily react if someone starts pointing at missing
> > parts). I think either way is doable, but you're right that in case we
> use
> > wiki-pages, each wiki page should clearly state which versions they apply
> > to, if they are version-specific.
> >
> >
> > 2014-03-24 23:03 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >
> >> Hmm seems like we need to bundle the doc for each release. For
> >> example, the 4.0.0 does not have HBase store.
> >>
> >> Most projects have docs for each release on top of project homepage,
> >> like Zookeeper http://zookeeper.apache.org/doc/r3.4.6/ or Spark
> >> http://spark.apache.org/docs/0.9.0/
> >>
> >> Thoughts?
> >>
> >> - Henry
> >>
> >> On Mon, Mar 24, 2014 at 2:50 PM, Kasper Sørensen
> >> <i....@gmail.com> wrote:
> >> > Hmm I suppose a wiki page would be good. I guess we have wiki pages
> for
> >> > some of the DataContext implementations already like Salesforce [1],
> POJO
> >> > [2] and Composite [3] ... Maybe we should even have a page for *every
> >> > *DataContext
> >> > implementation there is, simply for completeness and referenceability
> of
> >> > documentation.
> >> >
> >> > [1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext
> >> > [2] http://wiki.apache.org/metamodel/examples/PojoDataContext
> >> > [3] http://wiki.apache.org/metamodel/examples/CompositeDataContext
> >> >
> >> >
> >> > 2014-03-24 22:44 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >> >
> >> >> Ok +1
> >> >>
> >> >> How do you propose to document this feature? As another page in the
> >> >> doc svn repo?
> >> >>
> >> >> - Henry
> >> >>
> >> >> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen
> >> >> <i....@gmail.com> wrote:
> >> >> > Yep. Or in slightly more technical terms: It means that the
> >> >> > HBaseDataContext only implements DataContext which has these two
> >> >> > significant methods:
> >> >> >
> >> >> >  * getSchemas()
> >> >> >  * executeQuery(...)
> >> >> >
> >> >> > (Plus a bunch more methods, but those two give you the general
> >> >> impression:
> >> >> > Explore metadata and fire queries / reads)
> >> >> > But not UpdateableDataContext, which has the write operations:
> >> >> >
> >> >> >  * executeUpdate(...)
> >> >> >
> >> >> > Regards,
> >> >> > Kasper
> >> >> >
> >> >> >
> >> >> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <henry.saputra@gmail.com
> >:
> >> >> >
> >> >> >> Hmm, what does it mean by read only? You can use it to read data
> from
> >> >> >> HBase?
> >> >> >>
> >> >> >> - Henry
> >> >> >>
> >> >> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
> >> >> >> <i....@gmail.com> wrote:
> >> >> >> > A quick update on this since the module has now been merged into
> >> the
> >> >> >> master
> >> >> >> > branch:
> >> >> >> >
> >> >> >> > 1) Module is still read-only. This is accepted for now (unless
> >> someone
> >> >> >> > wants to help change it of course).
> >> >> >> >
> >> >> >> > 2) Metadata mapping is still working in two modes: a) we
> discover
> >> the
> >> >> >> > column families and expose them as byte-array maps (not very
> >> useful,
> >> >> but
> >> >> >> > works as a "lowest common denominator") and b) the user
> provides a
> >> >> set of
> >> >> >> > SimpleTableDef (which now has a convenient parser btw.:)) and
> gets
> >> his
> >> >> >> > table mapping as he wants it.
> >> >> >> >
> >> >> >> > 3) Querying now has special support for lookup-by-id type
> queries
> >> >> where
> >> >> >> we
> >> >> >> > will use HBase Get instead of Scan. We also have good support
> for
> >> >> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we
> will
> >> >> scan
> >> >> >> > past the first records on the client side).
> >> >> >> >
> >> >> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop
> comes in
> >> >> many
> >> >> >> > flavours and all are not compatible. I doubt there's a lot we
> can
> >> do
> >> >> >> about
> >> >> >> > it, except ask the users to provide their own HBase dependency
> as
> >> per
> >> >> >> their
> >> >> >> > backend version. We should probably thus make all our
> HBase/Hadoop
> >> >> >> > dependencies <optional>true</optional> in order to not influence
> >> the
> >> >> >> > typical clients.
> >> >> >> >
> >> >> >> > Kasper
> >> >> >> >
> >> >> >> >
> >> >> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
> >> >> >> i.am.kasper.sorensen@gmail.com>:
> >> >> >> >
> >> >> >> >> Hi Henry,
> >> >> >> >>
> >> >> >> >> Yea the Phoenix project is definately an interesting approach
> to
> >> >> making
> >> >> >> MM
> >> >> >> >> capable of working with HBase. The only downside to me is that
> it
> >> >> seems
> >> >> >> >> they do a lot of intrusive stuff to HBase like creating new
> index
> >> >> tables
> >> >> >> >> etc... I would normally not "allow" that for a simple
> connector.
> >> >> >> >>
> >> >> >> >> Maybe we should simply support both styles. And in the case of
> >> >> Phoenix,
> >> >> >> I
> >> >> >> >> guess we could simply go through the JDBC module of MetaModel
> and
> >> >> >> connect
> >> >> >> >> via their JDBC driver... Is that maybe a route, do you know?
> >> >> >> >>
> >> >> >> >> - Kasper
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <
> henry.saputra@gmail.com
> >> >:
> >> >> >> >>
> >> >> >> >> We could use the HBase client library from the store I suppose.
> >> >> >> >>> The issue I am actually worry is actually adding real query
> >> support
> >> >> >> >>> for column based datastore is kind of big task.
> >> >> >> >>> Apache Phoenix tried to do that so maybe we could leverage the
> >> SQL
> >> >> >> >>> planner layer to provide the implementation of the query
> >> execution
> >> >> to
> >> >> >> >>> HBase layer?
> >> >> >> >>>
> >> >> >> >>> - Henry
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
> >> >> >> >>> <i....@gmail.com> wrote:
> >> >> >> >>> > Thanks for the input Henry. With your experience, do you
> then
> >> also
> >> >> >> >>> happen
> >> >> >> >>> > to know of a good thin client-side library? I imagine that
> we
> >> >> could
> >> >> >> >>> maybe
> >> >> >> >>> > use a REST client instead of the full client we currently
> use.
> >> >> That
> >> >> >> >>> would
> >> >> >> >>> > save us a ton of dependency-overhead I think. Or is it a
> >> >> non-issue in
> >> >> >> >>> your
> >> >> >> >>> > mind, since HBase users are used to this overhead?
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <
> >> henry.saputra@gmail.com
> >> >> >:
> >> >> >> >>> >
> >> >> >> >>> >> For 1 > I think adding read only to HBase should be ok
> because
> >> >> most
> >> >> >> >>> >> update to HBase either through HBase client or REST via
> >> Stargate
> >> >> [1]
> >> >> >> >>> >> or Thrift
> >> >> >> >>> >>
> >> >> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to
> >> column
> >> >> and
> >> >> >> >>> >> generate POJO java via Avro compiler.
> >> >> >> >>> >>
> >> >> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix
> >> >> incubating
> >> >> >> try
> >> >> >> >>> >> to provide SQL to HBase [2] via extra indexing and
> caching. I
> >> >> think
> >> >> >> >>> >> this is defeat the purpose of having NoSQL databases that
> >> serve
> >> >> >> >>> >> different purpose than Relational databse.
> >> >> >> >>> >>
> >> >> >> >>> >> I am not sure Metamodel should touch NoSQL databases which
> >> more
> >> >> like
> >> >> >> >>> >> column types. These databases are designed for large data
> with
> >> >> >> access
> >> >> >> >>> >> primary via key and not query mechanism.
> >> >> >> >>> >>
> >> >> >> >>> >> Just my 2-cent
> >> >> >> >>> >>
> >> >> >> >>> >>
> >> >> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> >> >> >> >>> >> [2] http://phoenix.incubator.apache.org/
> >> >> >> >>> >>
> >> >> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> >> >> >> >>> >> <i....@gmail.com> wrote:
> >> >> >> >>> >> > Hi everyone,
> >> >> >> >>> >> >
> >> >> >> >>> >> > I was looking at our "hbase-module" branch and as much
> as I
> >> >> like
> >> >> >> this
> >> >> >> >>> >> idea,
> >> >> >> >>> >> > I think we've been a bit too idle with the branch. Maybe
> we
> >> >> should
> >> >> >> >>> try to
> >> >> >> >>> >> > make something final e.g. for a version 4.1.
> >> >> >> >>> >> >
> >> >> >> >>> >> > So I thought to give an overview/status of the module's
> >> current
> >> >> >> >>> >> > capabilities and it's shortcomings. We should figure out
> if
> >> we
> >> >> >> think
> >> >> >> >>> this
> >> >> >> >>> >> > is good enough for a first version, or if we want to do
> some
> >> >> >> >>> improvements
> >> >> >> >>> >> > to the module before adding it to our portfolio of
> MetaModel
> >> >> >> modules.
> >> >> >> >>> >> >
> >> >> >> >>> >> > 1) The module only offers read-only/query access to
> HBase.
> >> >> That is
> >> >> >> >>> in my
> >> >> >> >>> >> > opinion OK for now, we have several such modules, and
> this
> >> is
> >> >> >> >>> something
> >> >> >> >>> >> we
> >> >> >> >>> >> > can better add later if we straighten out the remaining
> >> topics
> >> >> in
> >> >> >> >>> this
> >> >> >> >>> >> mail.
> >> >> >> >>> >> >
> >> >> >> >>> >> > 2) With regards to metadata mapping: HBase is different
> >> >> because it
> >> >> >> >>> has
> >> >> >> >>> >> both
> >> >> >> >>> >> > column families and in column families there are columns.
> >> For
> >> >> the
> >> >> >> >>> sake of
> >> >> >> >>> >> > our view on HBase I would describe column families simply
> >> as "a
> >> >> >> >>> logical
> >> >> >> >>> >> of
> >> >> >> >>> >> > columns". Column families are fixed within a table, but
> rows
> >> >> in a
> >> >> >> >>> table
> >> >> >> >>> >> may
> >> >> >> >>> >> > contain arbitrary numbers of columns within each column
> >> family.
> >> >> >> >>> So... You
> >> >> >> >>> >> > can instantiate the HBaseDataContext in two ways:
> >> >> >> >>> >> >
> >> >> >> >>> >> > 2a) You can let MetaModel discover the metadata. This
> >> >> >> unfortunately
> >> >> >> >>> has a
> >> >> >> >>> >> > severe limitation. We discover the table names and column
> >> >> families
> >> >> >> >>> using
> >> >> >> >>> >> > the HBase API. But the actual columns and their contents
> >> >> cannot be
> >> >> >> >>> >> provided
> >> >> >> >>> >> > by the API. So instead we simply expose the column
> families
> >> >> with a
> >> >> >> >>> MAP
> >> >> >> >>> >> data
> >> >> >> >>> >> > types. The trouble with this is that the keys and values
> of
> >> the
> >> >> >> maps
> >> >> >> >>> will
> >> >> >> >>> >> > simply be byte-arrays ... Usually not very useful! But
> it's
> >> >> sort
> >> >> >> of
> >> >> >> >>> the
> >> >> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase,
> >> since
> >> >> >> HBase
> >> >> >> >>> >> allows
> >> >> >> >>> >> > anything (byte arrays) in it's columns.
> >> >> >> >>> >> >
> >> >> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can
> provide
> >> an
> >> >> >> array
> >> >> >> >>> of
> >> >> >> >>> >> > tables (SimpleTableDef). That way the user defines the
> >> metadata
> >> >> >> >>> himself
> >> >> >> >>> >> and
> >> >> >> >>> >> > the implementation assumes that it is correct (or else it
> >> will
> >> >> >> >>> break).
> >> >> >> >>> >> The
> >> >> >> >>> >> > good thing about this is that the user can define the
> proper
> >> >> data
> >> >> >> >>> types
> >> >> >> >>> >> > etc. for columns. The user defines the column family and
> >> column
> >> >> >> name
> >> >> >> >>> by
> >> >> >> >>> >> > setting defining the MetaModel column name as this:
> >> >> "family:name"
> >> >> >> >>> >> > (consistent with most HBase tools and API calls).
> >> >> >> >>> >> >
> >> >> >> >>> >> > 3) With regards to querying: We've implemented basic
> query
> >> >> >> >>> capabilities
> >> >> >> >>> >> > using the MetaModel query postprocessor. But not all
> queries
> >> >> are
> >> >> >> very
> >> >> >> >>> >> > effective... In addition to of course full table scans,
> we
> >> have
> >> >> >> >>> optimized
> >> >> >> >>> >> > support of of COUNT queries and of table scans with
> maxRows.
> >> >> >> >>> >> >
> >> >> >> >>> >> > We could rather easily add optimized support for a
> couple of
> >> >> other
> >> >> >> >>> >> typical
> >> >> >> >>> >> > queries:
> >> >> >> >>> >> >  * lookup record by ID
> >> >> >> >>> >> >  * paged table scans (both firstRow and maxRows)
> >> >> >> >>> >> >  * queries with simple filters/where items
> >> >> >> >>> >> >
> >> >> >> >>> >> > 4) With regards to dependencies: The module right now
> >> depends
> >> >> on
> >> >> >> the
> >> >> >> >>> >> > artifact called "hbase-client". This dependency has a
> loot
> >> of
> >> >> >> >>> transient
> >> >> >> >>> >> > dependencies so the size of the module is quite extreme.
> As
> >> an
> >> >> >> >>> example,
> >> >> >> >>> >> it
> >> >> >> >>> >> > includes stuff like jetty, jersey, jackson and of course
> >> >> hadoop...
> >> >> >> >>> But I
> >> >> >> >>> >> am
> >> >> >> >>> >> > wondering if we can have a more thin client-side than
> that!
> >> If
> >> >> >> anyone
> >> >> >> >>> >> knows
> >> >> >> >>> >> > if e.g. we can use the REST interface easily or so, that
> >> would
> >> >> >> maybe
> >> >> >> >>> be
> >> >> >> >>> >> > better. I'm not an expert on HBase though, so please
> >> enlighten
> >> >> me!
> >> >> >> >>> >> >
> >> >> >> >>> >> > Kind regards,
> >> >> >> >>> >> > Kasper
> >> >> >> >>> >>
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Henry Saputra <he...@gmail.com>.
Some projects do link back from homepage to wiki page. I think the
main key is to have separate docs for each release.

What do you think?

- Henry

On Tue, Mar 25, 2014 at 4:47 AM, Kasper Sørensen
<i....@gmail.com> wrote:
> Hmm was kinda hoping we wouldn't have to... But that's just because I am
> lazy and I prefer "live" (editable online) documentation where possible
> (that way you can easily react if someone starts pointing at missing
> parts). I think either way is doable, but you're right that in case we use
> wiki-pages, each wiki page should clearly state which versions they apply
> to, if they are version-specific.
>
>
> 2014-03-24 23:03 GMT+01:00 Henry Saputra <he...@gmail.com>:
>
>> Hmm seems like we need to bundle the doc for each release. For
>> example, the 4.0.0 does not have HBase store.
>>
>> Most projects have docs for each release on top of project homepage,
>> like Zookeeper http://zookeeper.apache.org/doc/r3.4.6/ or Spark
>> http://spark.apache.org/docs/0.9.0/
>>
>> Thoughts?
>>
>> - Henry
>>
>> On Mon, Mar 24, 2014 at 2:50 PM, Kasper Sørensen
>> <i....@gmail.com> wrote:
>> > Hmm I suppose a wiki page would be good. I guess we have wiki pages for
>> > some of the DataContext implementations already like Salesforce [1], POJO
>> > [2] and Composite [3] ... Maybe we should even have a page for *every
>> > *DataContext
>> > implementation there is, simply for completeness and referenceability of
>> > documentation.
>> >
>> > [1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext
>> > [2] http://wiki.apache.org/metamodel/examples/PojoDataContext
>> > [3] http://wiki.apache.org/metamodel/examples/CompositeDataContext
>> >
>> >
>> > 2014-03-24 22:44 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >
>> >> Ok +1
>> >>
>> >> How do you propose to document this feature? As another page in the
>> >> doc svn repo?
>> >>
>> >> - Henry
>> >>
>> >> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen
>> >> <i....@gmail.com> wrote:
>> >> > Yep. Or in slightly more technical terms: It means that the
>> >> > HBaseDataContext only implements DataContext which has these two
>> >> > significant methods:
>> >> >
>> >> >  * getSchemas()
>> >> >  * executeQuery(...)
>> >> >
>> >> > (Plus a bunch more methods, but those two give you the general
>> >> impression:
>> >> > Explore metadata and fire queries / reads)
>> >> > But not UpdateableDataContext, which has the write operations:
>> >> >
>> >> >  * executeUpdate(...)
>> >> >
>> >> > Regards,
>> >> > Kasper
>> >> >
>> >> >
>> >> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >> >
>> >> >> Hmm, what does it mean by read only? You can use it to read data from
>> >> >> HBase?
>> >> >>
>> >> >> - Henry
>> >> >>
>> >> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
>> >> >> <i....@gmail.com> wrote:
>> >> >> > A quick update on this since the module has now been merged into
>> the
>> >> >> master
>> >> >> > branch:
>> >> >> >
>> >> >> > 1) Module is still read-only. This is accepted for now (unless
>> someone
>> >> >> > wants to help change it of course).
>> >> >> >
>> >> >> > 2) Metadata mapping is still working in two modes: a) we discover
>> the
>> >> >> > column families and expose them as byte-array maps (not very
>> useful,
>> >> but
>> >> >> > works as a "lowest common denominator") and b) the user provides a
>> >> set of
>> >> >> > SimpleTableDef (which now has a convenient parser btw.:)) and gets
>> his
>> >> >> > table mapping as he wants it.
>> >> >> >
>> >> >> > 3) Querying now has special support for lookup-by-id type queries
>> >> where
>> >> >> we
>> >> >> > will use HBase Get instead of Scan. We also have good support for
>> >> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will
>> >> scan
>> >> >> > past the first records on the client side).
>> >> >> >
>> >> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in
>> >> many
>> >> >> > flavours and all are not compatible. I doubt there's a lot we can
>> do
>> >> >> about
>> >> >> > it, except ask the users to provide their own HBase dependency as
>> per
>> >> >> their
>> >> >> > backend version. We should probably thus make all our HBase/Hadoop
>> >> >> > dependencies <optional>true</optional> in order to not influence
>> the
>> >> >> > typical clients.
>> >> >> >
>> >> >> > Kasper
>> >> >> >
>> >> >> >
>> >> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
>> >> >> i.am.kasper.sorensen@gmail.com>:
>> >> >> >
>> >> >> >> Hi Henry,
>> >> >> >>
>> >> >> >> Yea the Phoenix project is definately an interesting approach to
>> >> making
>> >> >> MM
>> >> >> >> capable of working with HBase. The only downside to me is that it
>> >> seems
>> >> >> >> they do a lot of intrusive stuff to HBase like creating new index
>> >> tables
>> >> >> >> etc... I would normally not "allow" that for a simple connector.
>> >> >> >>
>> >> >> >> Maybe we should simply support both styles. And in the case of
>> >> Phoenix,
>> >> >> I
>> >> >> >> guess we could simply go through the JDBC module of MetaModel and
>> >> >> connect
>> >> >> >> via their JDBC driver... Is that maybe a route, do you know?
>> >> >> >>
>> >> >> >> - Kasper
>> >> >> >>
>> >> >> >>
>> >> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <henry.saputra@gmail.com
>> >:
>> >> >> >>
>> >> >> >> We could use the HBase client library from the store I suppose.
>> >> >> >>> The issue I am actually worry is actually adding real query
>> support
>> >> >> >>> for column based datastore is kind of big task.
>> >> >> >>> Apache Phoenix tried to do that so maybe we could leverage the
>> SQL
>> >> >> >>> planner layer to provide the implementation of the query
>> execution
>> >> to
>> >> >> >>> HBase layer?
>> >> >> >>>
>> >> >> >>> - Henry
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
>> >> >> >>> <i....@gmail.com> wrote:
>> >> >> >>> > Thanks for the input Henry. With your experience, do you then
>> also
>> >> >> >>> happen
>> >> >> >>> > to know of a good thin client-side library? I imagine that we
>> >> could
>> >> >> >>> maybe
>> >> >> >>> > use a REST client instead of the full client we currently use.
>> >> That
>> >> >> >>> would
>> >> >> >>> > save us a ton of dependency-overhead I think. Or is it a
>> >> non-issue in
>> >> >> >>> your
>> >> >> >>> > mind, since HBase users are used to this overhead?
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <
>> henry.saputra@gmail.com
>> >> >:
>> >> >> >>> >
>> >> >> >>> >> For 1 > I think adding read only to HBase should be ok because
>> >> most
>> >> >> >>> >> update to HBase either through HBase client or REST via
>> Stargate
>> >> [1]
>> >> >> >>> >> or Thrift
>> >> >> >>> >>
>> >> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to
>> column
>> >> and
>> >> >> >>> >> generate POJO java via Avro compiler.
>> >> >> >>> >>
>> >> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix
>> >> incubating
>> >> >> try
>> >> >> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I
>> >> think
>> >> >> >>> >> this is defeat the purpose of having NoSQL databases that
>> serve
>> >> >> >>> >> different purpose than Relational databse.
>> >> >> >>> >>
>> >> >> >>> >> I am not sure Metamodel should touch NoSQL databases which
>> more
>> >> like
>> >> >> >>> >> column types. These databases are designed for large data with
>> >> >> access
>> >> >> >>> >> primary via key and not query mechanism.
>> >> >> >>> >>
>> >> >> >>> >> Just my 2-cent
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
>> >> >> >>> >> [2] http://phoenix.incubator.apache.org/
>> >> >> >>> >>
>> >> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
>> >> >> >>> >> <i....@gmail.com> wrote:
>> >> >> >>> >> > Hi everyone,
>> >> >> >>> >> >
>> >> >> >>> >> > I was looking at our "hbase-module" branch and as much as I
>> >> like
>> >> >> this
>> >> >> >>> >> idea,
>> >> >> >>> >> > I think we've been a bit too idle with the branch. Maybe we
>> >> should
>> >> >> >>> try to
>> >> >> >>> >> > make something final e.g. for a version 4.1.
>> >> >> >>> >> >
>> >> >> >>> >> > So I thought to give an overview/status of the module's
>> current
>> >> >> >>> >> > capabilities and it's shortcomings. We should figure out if
>> we
>> >> >> think
>> >> >> >>> this
>> >> >> >>> >> > is good enough for a first version, or if we want to do some
>> >> >> >>> improvements
>> >> >> >>> >> > to the module before adding it to our portfolio of MetaModel
>> >> >> modules.
>> >> >> >>> >> >
>> >> >> >>> >> > 1) The module only offers read-only/query access to HBase.
>> >> That is
>> >> >> >>> in my
>> >> >> >>> >> > opinion OK for now, we have several such modules, and this
>> is
>> >> >> >>> something
>> >> >> >>> >> we
>> >> >> >>> >> > can better add later if we straighten out the remaining
>> topics
>> >> in
>> >> >> >>> this
>> >> >> >>> >> mail.
>> >> >> >>> >> >
>> >> >> >>> >> > 2) With regards to metadata mapping: HBase is different
>> >> because it
>> >> >> >>> has
>> >> >> >>> >> both
>> >> >> >>> >> > column families and in column families there are columns.
>> For
>> >> the
>> >> >> >>> sake of
>> >> >> >>> >> > our view on HBase I would describe column families simply
>> as "a
>> >> >> >>> logical
>> >> >> >>> >> of
>> >> >> >>> >> > columns". Column families are fixed within a table, but rows
>> >> in a
>> >> >> >>> table
>> >> >> >>> >> may
>> >> >> >>> >> > contain arbitrary numbers of columns within each column
>> family.
>> >> >> >>> So... You
>> >> >> >>> >> > can instantiate the HBaseDataContext in two ways:
>> >> >> >>> >> >
>> >> >> >>> >> > 2a) You can let MetaModel discover the metadata. This
>> >> >> unfortunately
>> >> >> >>> has a
>> >> >> >>> >> > severe limitation. We discover the table names and column
>> >> families
>> >> >> >>> using
>> >> >> >>> >> > the HBase API. But the actual columns and their contents
>> >> cannot be
>> >> >> >>> >> provided
>> >> >> >>> >> > by the API. So instead we simply expose the column families
>> >> with a
>> >> >> >>> MAP
>> >> >> >>> >> data
>> >> >> >>> >> > types. The trouble with this is that the keys and values of
>> the
>> >> >> maps
>> >> >> >>> will
>> >> >> >>> >> > simply be byte-arrays ... Usually not very useful! But it's
>> >> sort
>> >> >> of
>> >> >> >>> the
>> >> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase,
>> since
>> >> >> HBase
>> >> >> >>> >> allows
>> >> >> >>> >> > anything (byte arrays) in it's columns.
>> >> >> >>> >> >
>> >> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide
>> an
>> >> >> array
>> >> >> >>> of
>> >> >> >>> >> > tables (SimpleTableDef). That way the user defines the
>> metadata
>> >> >> >>> himself
>> >> >> >>> >> and
>> >> >> >>> >> > the implementation assumes that it is correct (or else it
>> will
>> >> >> >>> break).
>> >> >> >>> >> The
>> >> >> >>> >> > good thing about this is that the user can define the proper
>> >> data
>> >> >> >>> types
>> >> >> >>> >> > etc. for columns. The user defines the column family and
>> column
>> >> >> name
>> >> >> >>> by
>> >> >> >>> >> > setting defining the MetaModel column name as this:
>> >> "family:name"
>> >> >> >>> >> > (consistent with most HBase tools and API calls).
>> >> >> >>> >> >
>> >> >> >>> >> > 3) With regards to querying: We've implemented basic query
>> >> >> >>> capabilities
>> >> >> >>> >> > using the MetaModel query postprocessor. But not all queries
>> >> are
>> >> >> very
>> >> >> >>> >> > effective... In addition to of course full table scans, we
>> have
>> >> >> >>> optimized
>> >> >> >>> >> > support of of COUNT queries and of table scans with maxRows.
>> >> >> >>> >> >
>> >> >> >>> >> > We could rather easily add optimized support for a couple of
>> >> other
>> >> >> >>> >> typical
>> >> >> >>> >> > queries:
>> >> >> >>> >> >  * lookup record by ID
>> >> >> >>> >> >  * paged table scans (both firstRow and maxRows)
>> >> >> >>> >> >  * queries with simple filters/where items
>> >> >> >>> >> >
>> >> >> >>> >> > 4) With regards to dependencies: The module right now
>> depends
>> >> on
>> >> >> the
>> >> >> >>> >> > artifact called "hbase-client". This dependency has a loot
>> of
>> >> >> >>> transient
>> >> >> >>> >> > dependencies so the size of the module is quite extreme. As
>> an
>> >> >> >>> example,
>> >> >> >>> >> it
>> >> >> >>> >> > includes stuff like jetty, jersey, jackson and of course
>> >> hadoop...
>> >> >> >>> But I
>> >> >> >>> >> am
>> >> >> >>> >> > wondering if we can have a more thin client-side than that!
>> If
>> >> >> anyone
>> >> >> >>> >> knows
>> >> >> >>> >> > if e.g. we can use the REST interface easily or so, that
>> would
>> >> >> maybe
>> >> >> >>> be
>> >> >> >>> >> > better. I'm not an expert on HBase though, so please
>> enlighten
>> >> me!
>> >> >> >>> >> >
>> >> >> >>> >> > Kind regards,
>> >> >> >>> >> > Kasper
>> >> >> >>> >>
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Kasper Sørensen <i....@gmail.com>.
Hmm was kinda hoping we wouldn't have to... But that's just because I am
lazy and I prefer "live" (editable online) documentation where possible
(that way you can easily react if someone starts pointing at missing
parts). I think either way is doable, but you're right that in case we use
wiki-pages, each wiki page should clearly state which versions they apply
to, if they are version-specific.


2014-03-24 23:03 GMT+01:00 Henry Saputra <he...@gmail.com>:

> Hmm seems like we need to bundle the doc for each release. For
> example, the 4.0.0 does not have HBase store.
>
> Most projects have docs for each release on top of project homepage,
> like Zookeeper http://zookeeper.apache.org/doc/r3.4.6/ or Spark
> http://spark.apache.org/docs/0.9.0/
>
> Thoughts?
>
> - Henry
>
> On Mon, Mar 24, 2014 at 2:50 PM, Kasper Sørensen
> <i....@gmail.com> wrote:
> > Hmm I suppose a wiki page would be good. I guess we have wiki pages for
> > some of the DataContext implementations already like Salesforce [1], POJO
> > [2] and Composite [3] ... Maybe we should even have a page for *every
> > *DataContext
> > implementation there is, simply for completeness and referenceability of
> > documentation.
> >
> > [1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext
> > [2] http://wiki.apache.org/metamodel/examples/PojoDataContext
> > [3] http://wiki.apache.org/metamodel/examples/CompositeDataContext
> >
> >
> > 2014-03-24 22:44 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >
> >> Ok +1
> >>
> >> How do you propose to document this feature? As another page in the
> >> doc svn repo?
> >>
> >> - Henry
> >>
> >> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen
> >> <i....@gmail.com> wrote:
> >> > Yep. Or in slightly more technical terms: It means that the
> >> > HBaseDataContext only implements DataContext which has these two
> >> > significant methods:
> >> >
> >> >  * getSchemas()
> >> >  * executeQuery(...)
> >> >
> >> > (Plus a bunch more methods, but those two give you the general
> >> impression:
> >> > Explore metadata and fire queries / reads)
> >> > But not UpdateableDataContext, which has the write operations:
> >> >
> >> >  * executeUpdate(...)
> >> >
> >> > Regards,
> >> > Kasper
> >> >
> >> >
> >> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >> >
> >> >> Hmm, what does it mean by read only? You can use it to read data from
> >> >> HBase?
> >> >>
> >> >> - Henry
> >> >>
> >> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
> >> >> <i....@gmail.com> wrote:
> >> >> > A quick update on this since the module has now been merged into
> the
> >> >> master
> >> >> > branch:
> >> >> >
> >> >> > 1) Module is still read-only. This is accepted for now (unless
> someone
> >> >> > wants to help change it of course).
> >> >> >
> >> >> > 2) Metadata mapping is still working in two modes: a) we discover
> the
> >> >> > column families and expose them as byte-array maps (not very
> useful,
> >> but
> >> >> > works as a "lowest common denominator") and b) the user provides a
> >> set of
> >> >> > SimpleTableDef (which now has a convenient parser btw.:)) and gets
> his
> >> >> > table mapping as he wants it.
> >> >> >
> >> >> > 3) Querying now has special support for lookup-by-id type queries
> >> where
> >> >> we
> >> >> > will use HBase Get instead of Scan. We also have good support for
> >> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will
> >> scan
> >> >> > past the first records on the client side).
> >> >> >
> >> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in
> >> many
> >> >> > flavours and all are not compatible. I doubt there's a lot we can
> do
> >> >> about
> >> >> > it, except ask the users to provide their own HBase dependency as
> per
> >> >> their
> >> >> > backend version. We should probably thus make all our HBase/Hadoop
> >> >> > dependencies <optional>true</optional> in order to not influence
> the
> >> >> > typical clients.
> >> >> >
> >> >> > Kasper
> >> >> >
> >> >> >
> >> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
> >> >> i.am.kasper.sorensen@gmail.com>:
> >> >> >
> >> >> >> Hi Henry,
> >> >> >>
> >> >> >> Yea the Phoenix project is definately an interesting approach to
> >> making
> >> >> MM
> >> >> >> capable of working with HBase. The only downside to me is that it
> >> seems
> >> >> >> they do a lot of intrusive stuff to HBase like creating new index
> >> tables
> >> >> >> etc... I would normally not "allow" that for a simple connector.
> >> >> >>
> >> >> >> Maybe we should simply support both styles. And in the case of
> >> Phoenix,
> >> >> I
> >> >> >> guess we could simply go through the JDBC module of MetaModel and
> >> >> connect
> >> >> >> via their JDBC driver... Is that maybe a route, do you know?
> >> >> >>
> >> >> >> - Kasper
> >> >> >>
> >> >> >>
> >> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <henry.saputra@gmail.com
> >:
> >> >> >>
> >> >> >> We could use the HBase client library from the store I suppose.
> >> >> >>> The issue I am actually worry is actually adding real query
> support
> >> >> >>> for column based datastore is kind of big task.
> >> >> >>> Apache Phoenix tried to do that so maybe we could leverage the
> SQL
> >> >> >>> planner layer to provide the implementation of the query
> execution
> >> to
> >> >> >>> HBase layer?
> >> >> >>>
> >> >> >>> - Henry
> >> >> >>>
> >> >> >>>
> >> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
> >> >> >>> <i....@gmail.com> wrote:
> >> >> >>> > Thanks for the input Henry. With your experience, do you then
> also
> >> >> >>> happen
> >> >> >>> > to know of a good thin client-side library? I imagine that we
> >> could
> >> >> >>> maybe
> >> >> >>> > use a REST client instead of the full client we currently use.
> >> That
> >> >> >>> would
> >> >> >>> > save us a ton of dependency-overhead I think. Or is it a
> >> non-issue in
> >> >> >>> your
> >> >> >>> > mind, since HBase users are used to this overhead?
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <
> henry.saputra@gmail.com
> >> >:
> >> >> >>> >
> >> >> >>> >> For 1 > I think adding read only to HBase should be ok because
> >> most
> >> >> >>> >> update to HBase either through HBase client or REST via
> Stargate
> >> [1]
> >> >> >>> >> or Thrift
> >> >> >>> >>
> >> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to
> column
> >> and
> >> >> >>> >> generate POJO java via Avro compiler.
> >> >> >>> >>
> >> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix
> >> incubating
> >> >> try
> >> >> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I
> >> think
> >> >> >>> >> this is defeat the purpose of having NoSQL databases that
> serve
> >> >> >>> >> different purpose than Relational databse.
> >> >> >>> >>
> >> >> >>> >> I am not sure Metamodel should touch NoSQL databases which
> more
> >> like
> >> >> >>> >> column types. These databases are designed for large data with
> >> >> access
> >> >> >>> >> primary via key and not query mechanism.
> >> >> >>> >>
> >> >> >>> >> Just my 2-cent
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> >> >> >>> >> [2] http://phoenix.incubator.apache.org/
> >> >> >>> >>
> >> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> >> >> >>> >> <i....@gmail.com> wrote:
> >> >> >>> >> > Hi everyone,
> >> >> >>> >> >
> >> >> >>> >> > I was looking at our "hbase-module" branch and as much as I
> >> like
> >> >> this
> >> >> >>> >> idea,
> >> >> >>> >> > I think we've been a bit too idle with the branch. Maybe we
> >> should
> >> >> >>> try to
> >> >> >>> >> > make something final e.g. for a version 4.1.
> >> >> >>> >> >
> >> >> >>> >> > So I thought to give an overview/status of the module's
> current
> >> >> >>> >> > capabilities and it's shortcomings. We should figure out if
> we
> >> >> think
> >> >> >>> this
> >> >> >>> >> > is good enough for a first version, or if we want to do some
> >> >> >>> improvements
> >> >> >>> >> > to the module before adding it to our portfolio of MetaModel
> >> >> modules.
> >> >> >>> >> >
> >> >> >>> >> > 1) The module only offers read-only/query access to HBase.
> >> That is
> >> >> >>> in my
> >> >> >>> >> > opinion OK for now, we have several such modules, and this
> is
> >> >> >>> something
> >> >> >>> >> we
> >> >> >>> >> > can better add later if we straighten out the remaining
> topics
> >> in
> >> >> >>> this
> >> >> >>> >> mail.
> >> >> >>> >> >
> >> >> >>> >> > 2) With regards to metadata mapping: HBase is different
> >> because it
> >> >> >>> has
> >> >> >>> >> both
> >> >> >>> >> > column families and in column families there are columns.
> For
> >> the
> >> >> >>> sake of
> >> >> >>> >> > our view on HBase I would describe column families simply
> as "a
> >> >> >>> logical
> >> >> >>> >> of
> >> >> >>> >> > columns". Column families are fixed within a table, but rows
> >> in a
> >> >> >>> table
> >> >> >>> >> may
> >> >> >>> >> > contain arbitrary numbers of columns within each column
> family.
> >> >> >>> So... You
> >> >> >>> >> > can instantiate the HBaseDataContext in two ways:
> >> >> >>> >> >
> >> >> >>> >> > 2a) You can let MetaModel discover the metadata. This
> >> >> unfortunately
> >> >> >>> has a
> >> >> >>> >> > severe limitation. We discover the table names and column
> >> families
> >> >> >>> using
> >> >> >>> >> > the HBase API. But the actual columns and their contents
> >> cannot be
> >> >> >>> >> provided
> >> >> >>> >> > by the API. So instead we simply expose the column families
> >> with a
> >> >> >>> MAP
> >> >> >>> >> data
> >> >> >>> >> > types. The trouble with this is that the keys and values of
> the
> >> >> maps
> >> >> >>> will
> >> >> >>> >> > simply be byte-arrays ... Usually not very useful! But it's
> >> sort
> >> >> of
> >> >> >>> the
> >> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase,
> since
> >> >> HBase
> >> >> >>> >> allows
> >> >> >>> >> > anything (byte arrays) in it's columns.
> >> >> >>> >> >
> >> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide
> an
> >> >> array
> >> >> >>> of
> >> >> >>> >> > tables (SimpleTableDef). That way the user defines the
> metadata
> >> >> >>> himself
> >> >> >>> >> and
> >> >> >>> >> > the implementation assumes that it is correct (or else it
> will
> >> >> >>> break).
> >> >> >>> >> The
> >> >> >>> >> > good thing about this is that the user can define the proper
> >> data
> >> >> >>> types
> >> >> >>> >> > etc. for columns. The user defines the column family and
> column
> >> >> name
> >> >> >>> by
> >> >> >>> >> > setting defining the MetaModel column name as this:
> >> "family:name"
> >> >> >>> >> > (consistent with most HBase tools and API calls).
> >> >> >>> >> >
> >> >> >>> >> > 3) With regards to querying: We've implemented basic query
> >> >> >>> capabilities
> >> >> >>> >> > using the MetaModel query postprocessor. But not all queries
> >> are
> >> >> very
> >> >> >>> >> > effective... In addition to of course full table scans, we
> have
> >> >> >>> optimized
> >> >> >>> >> > support of of COUNT queries and of table scans with maxRows.
> >> >> >>> >> >
> >> >> >>> >> > We could rather easily add optimized support for a couple of
> >> other
> >> >> >>> >> typical
> >> >> >>> >> > queries:
> >> >> >>> >> >  * lookup record by ID
> >> >> >>> >> >  * paged table scans (both firstRow and maxRows)
> >> >> >>> >> >  * queries with simple filters/where items
> >> >> >>> >> >
> >> >> >>> >> > 4) With regards to dependencies: The module right now
> depends
> >> on
> >> >> the
> >> >> >>> >> > artifact called "hbase-client". This dependency has a loot
> of
> >> >> >>> transient
> >> >> >>> >> > dependencies so the size of the module is quite extreme. As
> an
> >> >> >>> example,
> >> >> >>> >> it
> >> >> >>> >> > includes stuff like jetty, jersey, jackson and of course
> >> hadoop...
> >> >> >>> But I
> >> >> >>> >> am
> >> >> >>> >> > wondering if we can have a more thin client-side than that!
> If
> >> >> anyone
> >> >> >>> >> knows
> >> >> >>> >> > if e.g. we can use the REST interface easily or so, that
> would
> >> >> maybe
> >> >> >>> be
> >> >> >>> >> > better. I'm not an expert on HBase though, so please
> enlighten
> >> me!
> >> >> >>> >> >
> >> >> >>> >> > Kind regards,
> >> >> >>> >> > Kasper
> >> >> >>> >>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Henry Saputra <he...@gmail.com>.
Hmm seems like we need to bundle the doc for each release. For
example, the 4.0.0 does not have HBase store.

Most projects have docs for each release on top of project homepage,
like Zookeeper http://zookeeper.apache.org/doc/r3.4.6/ or Spark
http://spark.apache.org/docs/0.9.0/

Thoughts?

- Henry

On Mon, Mar 24, 2014 at 2:50 PM, Kasper Sørensen
<i....@gmail.com> wrote:
> Hmm I suppose a wiki page would be good. I guess we have wiki pages for
> some of the DataContext implementations already like Salesforce [1], POJO
> [2] and Composite [3] ... Maybe we should even have a page for *every
> *DataContext
> implementation there is, simply for completeness and referenceability of
> documentation.
>
> [1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext
> [2] http://wiki.apache.org/metamodel/examples/PojoDataContext
> [3] http://wiki.apache.org/metamodel/examples/CompositeDataContext
>
>
> 2014-03-24 22:44 GMT+01:00 Henry Saputra <he...@gmail.com>:
>
>> Ok +1
>>
>> How do you propose to document this feature? As another page in the
>> doc svn repo?
>>
>> - Henry
>>
>> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen
>> <i....@gmail.com> wrote:
>> > Yep. Or in slightly more technical terms: It means that the
>> > HBaseDataContext only implements DataContext which has these two
>> > significant methods:
>> >
>> >  * getSchemas()
>> >  * executeQuery(...)
>> >
>> > (Plus a bunch more methods, but those two give you the general
>> impression:
>> > Explore metadata and fire queries / reads)
>> > But not UpdateableDataContext, which has the write operations:
>> >
>> >  * executeUpdate(...)
>> >
>> > Regards,
>> > Kasper
>> >
>> >
>> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >
>> >> Hmm, what does it mean by read only? You can use it to read data from
>> >> HBase?
>> >>
>> >> - Henry
>> >>
>> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
>> >> <i....@gmail.com> wrote:
>> >> > A quick update on this since the module has now been merged into the
>> >> master
>> >> > branch:
>> >> >
>> >> > 1) Module is still read-only. This is accepted for now (unless someone
>> >> > wants to help change it of course).
>> >> >
>> >> > 2) Metadata mapping is still working in two modes: a) we discover the
>> >> > column families and expose them as byte-array maps (not very useful,
>> but
>> >> > works as a "lowest common denominator") and b) the user provides a
>> set of
>> >> > SimpleTableDef (which now has a convenient parser btw.:)) and gets his
>> >> > table mapping as he wants it.
>> >> >
>> >> > 3) Querying now has special support for lookup-by-id type queries
>> where
>> >> we
>> >> > will use HBase Get instead of Scan. We also have good support for
>> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will
>> scan
>> >> > past the first records on the client side).
>> >> >
>> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in
>> many
>> >> > flavours and all are not compatible. I doubt there's a lot we can do
>> >> about
>> >> > it, except ask the users to provide their own HBase dependency as per
>> >> their
>> >> > backend version. We should probably thus make all our HBase/Hadoop
>> >> > dependencies <optional>true</optional> in order to not influence the
>> >> > typical clients.
>> >> >
>> >> > Kasper
>> >> >
>> >> >
>> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
>> >> i.am.kasper.sorensen@gmail.com>:
>> >> >
>> >> >> Hi Henry,
>> >> >>
>> >> >> Yea the Phoenix project is definately an interesting approach to
>> making
>> >> MM
>> >> >> capable of working with HBase. The only downside to me is that it
>> seems
>> >> >> they do a lot of intrusive stuff to HBase like creating new index
>> tables
>> >> >> etc... I would normally not "allow" that for a simple connector.
>> >> >>
>> >> >> Maybe we should simply support both styles. And in the case of
>> Phoenix,
>> >> I
>> >> >> guess we could simply go through the JDBC module of MetaModel and
>> >> connect
>> >> >> via their JDBC driver... Is that maybe a route, do you know?
>> >> >>
>> >> >> - Kasper
>> >> >>
>> >> >>
>> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >> >>
>> >> >> We could use the HBase client library from the store I suppose.
>> >> >>> The issue I am actually worry is actually adding real query support
>> >> >>> for column based datastore is kind of big task.
>> >> >>> Apache Phoenix tried to do that so maybe we could leverage the SQL
>> >> >>> planner layer to provide the implementation of the query execution
>> to
>> >> >>> HBase layer?
>> >> >>>
>> >> >>> - Henry
>> >> >>>
>> >> >>>
>> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
>> >> >>> <i....@gmail.com> wrote:
>> >> >>> > Thanks for the input Henry. With your experience, do you then also
>> >> >>> happen
>> >> >>> > to know of a good thin client-side library? I imagine that we
>> could
>> >> >>> maybe
>> >> >>> > use a REST client instead of the full client we currently use.
>> That
>> >> >>> would
>> >> >>> > save us a ton of dependency-overhead I think. Or is it a
>> non-issue in
>> >> >>> your
>> >> >>> > mind, since HBase users are used to this overhead?
>> >> >>> >
>> >> >>> >
>> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <henry.saputra@gmail.com
>> >:
>> >> >>> >
>> >> >>> >> For 1 > I think adding read only to HBase should be ok because
>> most
>> >> >>> >> update to HBase either through HBase client or REST via Stargate
>> [1]
>> >> >>> >> or Thrift
>> >> >>> >>
>> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to column
>> and
>> >> >>> >> generate POJO java via Avro compiler.
>> >> >>> >>
>> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix
>> incubating
>> >> try
>> >> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I
>> think
>> >> >>> >> this is defeat the purpose of having NoSQL databases that serve
>> >> >>> >> different purpose than Relational databse.
>> >> >>> >>
>> >> >>> >> I am not sure Metamodel should touch NoSQL databases which more
>> like
>> >> >>> >> column types. These databases are designed for large data with
>> >> access
>> >> >>> >> primary via key and not query mechanism.
>> >> >>> >>
>> >> >>> >> Just my 2-cent
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
>> >> >>> >> [2] http://phoenix.incubator.apache.org/
>> >> >>> >>
>> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
>> >> >>> >> <i....@gmail.com> wrote:
>> >> >>> >> > Hi everyone,
>> >> >>> >> >
>> >> >>> >> > I was looking at our "hbase-module" branch and as much as I
>> like
>> >> this
>> >> >>> >> idea,
>> >> >>> >> > I think we've been a bit too idle with the branch. Maybe we
>> should
>> >> >>> try to
>> >> >>> >> > make something final e.g. for a version 4.1.
>> >> >>> >> >
>> >> >>> >> > So I thought to give an overview/status of the module's current
>> >> >>> >> > capabilities and it's shortcomings. We should figure out if we
>> >> think
>> >> >>> this
>> >> >>> >> > is good enough for a first version, or if we want to do some
>> >> >>> improvements
>> >> >>> >> > to the module before adding it to our portfolio of MetaModel
>> >> modules.
>> >> >>> >> >
>> >> >>> >> > 1) The module only offers read-only/query access to HBase.
>> That is
>> >> >>> in my
>> >> >>> >> > opinion OK for now, we have several such modules, and this is
>> >> >>> something
>> >> >>> >> we
>> >> >>> >> > can better add later if we straighten out the remaining topics
>> in
>> >> >>> this
>> >> >>> >> mail.
>> >> >>> >> >
>> >> >>> >> > 2) With regards to metadata mapping: HBase is different
>> because it
>> >> >>> has
>> >> >>> >> both
>> >> >>> >> > column families and in column families there are columns. For
>> the
>> >> >>> sake of
>> >> >>> >> > our view on HBase I would describe column families simply as "a
>> >> >>> logical
>> >> >>> >> of
>> >> >>> >> > columns". Column families are fixed within a table, but rows
>> in a
>> >> >>> table
>> >> >>> >> may
>> >> >>> >> > contain arbitrary numbers of columns within each column family.
>> >> >>> So... You
>> >> >>> >> > can instantiate the HBaseDataContext in two ways:
>> >> >>> >> >
>> >> >>> >> > 2a) You can let MetaModel discover the metadata. This
>> >> unfortunately
>> >> >>> has a
>> >> >>> >> > severe limitation. We discover the table names and column
>> families
>> >> >>> using
>> >> >>> >> > the HBase API. But the actual columns and their contents
>> cannot be
>> >> >>> >> provided
>> >> >>> >> > by the API. So instead we simply expose the column families
>> with a
>> >> >>> MAP
>> >> >>> >> data
>> >> >>> >> > types. The trouble with this is that the keys and values of the
>> >> maps
>> >> >>> will
>> >> >>> >> > simply be byte-arrays ... Usually not very useful! But it's
>> sort
>> >> of
>> >> >>> the
>> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase, since
>> >> HBase
>> >> >>> >> allows
>> >> >>> >> > anything (byte arrays) in it's columns.
>> >> >>> >> >
>> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an
>> >> array
>> >> >>> of
>> >> >>> >> > tables (SimpleTableDef). That way the user defines the metadata
>> >> >>> himself
>> >> >>> >> and
>> >> >>> >> > the implementation assumes that it is correct (or else it will
>> >> >>> break).
>> >> >>> >> The
>> >> >>> >> > good thing about this is that the user can define the proper
>> data
>> >> >>> types
>> >> >>> >> > etc. for columns. The user defines the column family and column
>> >> name
>> >> >>> by
>> >> >>> >> > setting defining the MetaModel column name as this:
>> "family:name"
>> >> >>> >> > (consistent with most HBase tools and API calls).
>> >> >>> >> >
>> >> >>> >> > 3) With regards to querying: We've implemented basic query
>> >> >>> capabilities
>> >> >>> >> > using the MetaModel query postprocessor. But not all queries
>> are
>> >> very
>> >> >>> >> > effective... In addition to of course full table scans, we have
>> >> >>> optimized
>> >> >>> >> > support of of COUNT queries and of table scans with maxRows.
>> >> >>> >> >
>> >> >>> >> > We could rather easily add optimized support for a couple of
>> other
>> >> >>> >> typical
>> >> >>> >> > queries:
>> >> >>> >> >  * lookup record by ID
>> >> >>> >> >  * paged table scans (both firstRow and maxRows)
>> >> >>> >> >  * queries with simple filters/where items
>> >> >>> >> >
>> >> >>> >> > 4) With regards to dependencies: The module right now depends
>> on
>> >> the
>> >> >>> >> > artifact called "hbase-client". This dependency has a loot of
>> >> >>> transient
>> >> >>> >> > dependencies so the size of the module is quite extreme. As an
>> >> >>> example,
>> >> >>> >> it
>> >> >>> >> > includes stuff like jetty, jersey, jackson and of course
>> hadoop...
>> >> >>> But I
>> >> >>> >> am
>> >> >>> >> > wondering if we can have a more thin client-side than that! If
>> >> anyone
>> >> >>> >> knows
>> >> >>> >> > if e.g. we can use the REST interface easily or so, that would
>> >> maybe
>> >> >>> be
>> >> >>> >> > better. I'm not an expert on HBase though, so please enlighten
>> me!
>> >> >>> >> >
>> >> >>> >> > Kind regards,
>> >> >>> >> > Kasper
>> >> >>> >>
>> >> >>>
>> >> >>
>> >> >>
>> >>
>>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Kasper Sørensen <i....@gmail.com>.
Hmm I suppose a wiki page would be good. I guess we have wiki pages for
some of the DataContext implementations already like Salesforce [1], POJO
[2] and Composite [3] ... Maybe we should even have a page for *every
*DataContext
implementation there is, simply for completeness and referenceability of
documentation.

[1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext
[2] http://wiki.apache.org/metamodel/examples/PojoDataContext
[3] http://wiki.apache.org/metamodel/examples/CompositeDataContext


2014-03-24 22:44 GMT+01:00 Henry Saputra <he...@gmail.com>:

> Ok +1
>
> How do you propose to document this feature? As another page in the
> doc svn repo?
>
> - Henry
>
> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen
> <i....@gmail.com> wrote:
> > Yep. Or in slightly more technical terms: It means that the
> > HBaseDataContext only implements DataContext which has these two
> > significant methods:
> >
> >  * getSchemas()
> >  * executeQuery(...)
> >
> > (Plus a bunch more methods, but those two give you the general
> impression:
> > Explore metadata and fire queries / reads)
> > But not UpdateableDataContext, which has the write operations:
> >
> >  * executeUpdate(...)
> >
> > Regards,
> > Kasper
> >
> >
> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >
> >> Hmm, what does it mean by read only? You can use it to read data from
> >> HBase?
> >>
> >> - Henry
> >>
> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
> >> <i....@gmail.com> wrote:
> >> > A quick update on this since the module has now been merged into the
> >> master
> >> > branch:
> >> >
> >> > 1) Module is still read-only. This is accepted for now (unless someone
> >> > wants to help change it of course).
> >> >
> >> > 2) Metadata mapping is still working in two modes: a) we discover the
> >> > column families and expose them as byte-array maps (not very useful,
> but
> >> > works as a "lowest common denominator") and b) the user provides a
> set of
> >> > SimpleTableDef (which now has a convenient parser btw.:)) and gets his
> >> > table mapping as he wants it.
> >> >
> >> > 3) Querying now has special support for lookup-by-id type queries
> where
> >> we
> >> > will use HBase Get instead of Scan. We also have good support for
> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will
> scan
> >> > past the first records on the client side).
> >> >
> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in
> many
> >> > flavours and all are not compatible. I doubt there's a lot we can do
> >> about
> >> > it, except ask the users to provide their own HBase dependency as per
> >> their
> >> > backend version. We should probably thus make all our HBase/Hadoop
> >> > dependencies <optional>true</optional> in order to not influence the
> >> > typical clients.
> >> >
> >> > Kasper
> >> >
> >> >
> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
> >> i.am.kasper.sorensen@gmail.com>:
> >> >
> >> >> Hi Henry,
> >> >>
> >> >> Yea the Phoenix project is definately an interesting approach to
> making
> >> MM
> >> >> capable of working with HBase. The only downside to me is that it
> seems
> >> >> they do a lot of intrusive stuff to HBase like creating new index
> tables
> >> >> etc... I would normally not "allow" that for a simple connector.
> >> >>
> >> >> Maybe we should simply support both styles. And in the case of
> Phoenix,
> >> I
> >> >> guess we could simply go through the JDBC module of MetaModel and
> >> connect
> >> >> via their JDBC driver... Is that maybe a route, do you know?
> >> >>
> >> >> - Kasper
> >> >>
> >> >>
> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >> >>
> >> >> We could use the HBase client library from the store I suppose.
> >> >>> The issue I am actually worry is actually adding real query support
> >> >>> for column based datastore is kind of big task.
> >> >>> Apache Phoenix tried to do that so maybe we could leverage the SQL
> >> >>> planner layer to provide the implementation of the query execution
> to
> >> >>> HBase layer?
> >> >>>
> >> >>> - Henry
> >> >>>
> >> >>>
> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
> >> >>> <i....@gmail.com> wrote:
> >> >>> > Thanks for the input Henry. With your experience, do you then also
> >> >>> happen
> >> >>> > to know of a good thin client-side library? I imagine that we
> could
> >> >>> maybe
> >> >>> > use a REST client instead of the full client we currently use.
> That
> >> >>> would
> >> >>> > save us a ton of dependency-overhead I think. Or is it a
> non-issue in
> >> >>> your
> >> >>> > mind, since HBase users are used to this overhead?
> >> >>> >
> >> >>> >
> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <henry.saputra@gmail.com
> >:
> >> >>> >
> >> >>> >> For 1 > I think adding read only to HBase should be ok because
> most
> >> >>> >> update to HBase either through HBase client or REST via Stargate
> [1]
> >> >>> >> or Thrift
> >> >>> >>
> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to column
> and
> >> >>> >> generate POJO java via Avro compiler.
> >> >>> >>
> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix
> incubating
> >> try
> >> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I
> think
> >> >>> >> this is defeat the purpose of having NoSQL databases that serve
> >> >>> >> different purpose than Relational databse.
> >> >>> >>
> >> >>> >> I am not sure Metamodel should touch NoSQL databases which more
> like
> >> >>> >> column types. These databases are designed for large data with
> >> access
> >> >>> >> primary via key and not query mechanism.
> >> >>> >>
> >> >>> >> Just my 2-cent
> >> >>> >>
> >> >>> >>
> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> >> >>> >> [2] http://phoenix.incubator.apache.org/
> >> >>> >>
> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> >> >>> >> <i....@gmail.com> wrote:
> >> >>> >> > Hi everyone,
> >> >>> >> >
> >> >>> >> > I was looking at our "hbase-module" branch and as much as I
> like
> >> this
> >> >>> >> idea,
> >> >>> >> > I think we've been a bit too idle with the branch. Maybe we
> should
> >> >>> try to
> >> >>> >> > make something final e.g. for a version 4.1.
> >> >>> >> >
> >> >>> >> > So I thought to give an overview/status of the module's current
> >> >>> >> > capabilities and it's shortcomings. We should figure out if we
> >> think
> >> >>> this
> >> >>> >> > is good enough for a first version, or if we want to do some
> >> >>> improvements
> >> >>> >> > to the module before adding it to our portfolio of MetaModel
> >> modules.
> >> >>> >> >
> >> >>> >> > 1) The module only offers read-only/query access to HBase.
> That is
> >> >>> in my
> >> >>> >> > opinion OK for now, we have several such modules, and this is
> >> >>> something
> >> >>> >> we
> >> >>> >> > can better add later if we straighten out the remaining topics
> in
> >> >>> this
> >> >>> >> mail.
> >> >>> >> >
> >> >>> >> > 2) With regards to metadata mapping: HBase is different
> because it
> >> >>> has
> >> >>> >> both
> >> >>> >> > column families and in column families there are columns. For
> the
> >> >>> sake of
> >> >>> >> > our view on HBase I would describe column families simply as "a
> >> >>> logical
> >> >>> >> of
> >> >>> >> > columns". Column families are fixed within a table, but rows
> in a
> >> >>> table
> >> >>> >> may
> >> >>> >> > contain arbitrary numbers of columns within each column family.
> >> >>> So... You
> >> >>> >> > can instantiate the HBaseDataContext in two ways:
> >> >>> >> >
> >> >>> >> > 2a) You can let MetaModel discover the metadata. This
> >> unfortunately
> >> >>> has a
> >> >>> >> > severe limitation. We discover the table names and column
> families
> >> >>> using
> >> >>> >> > the HBase API. But the actual columns and their contents
> cannot be
> >> >>> >> provided
> >> >>> >> > by the API. So instead we simply expose the column families
> with a
> >> >>> MAP
> >> >>> >> data
> >> >>> >> > types. The trouble with this is that the keys and values of the
> >> maps
> >> >>> will
> >> >>> >> > simply be byte-arrays ... Usually not very useful! But it's
> sort
> >> of
> >> >>> the
> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase, since
> >> HBase
> >> >>> >> allows
> >> >>> >> > anything (byte arrays) in it's columns.
> >> >>> >> >
> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an
> >> array
> >> >>> of
> >> >>> >> > tables (SimpleTableDef). That way the user defines the metadata
> >> >>> himself
> >> >>> >> and
> >> >>> >> > the implementation assumes that it is correct (or else it will
> >> >>> break).
> >> >>> >> The
> >> >>> >> > good thing about this is that the user can define the proper
> data
> >> >>> types
> >> >>> >> > etc. for columns. The user defines the column family and column
> >> name
> >> >>> by
> >> >>> >> > setting defining the MetaModel column name as this:
> "family:name"
> >> >>> >> > (consistent with most HBase tools and API calls).
> >> >>> >> >
> >> >>> >> > 3) With regards to querying: We've implemented basic query
> >> >>> capabilities
> >> >>> >> > using the MetaModel query postprocessor. But not all queries
> are
> >> very
> >> >>> >> > effective... In addition to of course full table scans, we have
> >> >>> optimized
> >> >>> >> > support of of COUNT queries and of table scans with maxRows.
> >> >>> >> >
> >> >>> >> > We could rather easily add optimized support for a couple of
> other
> >> >>> >> typical
> >> >>> >> > queries:
> >> >>> >> >  * lookup record by ID
> >> >>> >> >  * paged table scans (both firstRow and maxRows)
> >> >>> >> >  * queries with simple filters/where items
> >> >>> >> >
> >> >>> >> > 4) With regards to dependencies: The module right now depends
> on
> >> the
> >> >>> >> > artifact called "hbase-client". This dependency has a loot of
> >> >>> transient
> >> >>> >> > dependencies so the size of the module is quite extreme. As an
> >> >>> example,
> >> >>> >> it
> >> >>> >> > includes stuff like jetty, jersey, jackson and of course
> hadoop...
> >> >>> But I
> >> >>> >> am
> >> >>> >> > wondering if we can have a more thin client-side than that! If
> >> anyone
> >> >>> >> knows
> >> >>> >> > if e.g. we can use the REST interface easily or so, that would
> >> maybe
> >> >>> be
> >> >>> >> > better. I'm not an expert on HBase though, so please enlighten
> me!
> >> >>> >> >
> >> >>> >> > Kind regards,
> >> >>> >> > Kasper
> >> >>> >>
> >> >>>
> >> >>
> >> >>
> >>
>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Henry Saputra <he...@gmail.com>.
Ok +1

How do you propose to document this feature? As another page in the
doc svn repo?

- Henry

On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen
<i....@gmail.com> wrote:
> Yep. Or in slightly more technical terms: It means that the
> HBaseDataContext only implements DataContext which has these two
> significant methods:
>
>  * getSchemas()
>  * executeQuery(...)
>
> (Plus a bunch more methods, but those two give you the general impression:
> Explore metadata and fire queries / reads)
> But not UpdateableDataContext, which has the write operations:
>
>  * executeUpdate(...)
>
> Regards,
> Kasper
>
>
> 2014-03-24 22:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>
>> Hmm, what does it mean by read only? You can use it to read data from
>> HBase?
>>
>> - Henry
>>
>> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
>> <i....@gmail.com> wrote:
>> > A quick update on this since the module has now been merged into the
>> master
>> > branch:
>> >
>> > 1) Module is still read-only. This is accepted for now (unless someone
>> > wants to help change it of course).
>> >
>> > 2) Metadata mapping is still working in two modes: a) we discover the
>> > column families and expose them as byte-array maps (not very useful, but
>> > works as a "lowest common denominator") and b) the user provides a set of
>> > SimpleTableDef (which now has a convenient parser btw.:)) and gets his
>> > table mapping as he wants it.
>> >
>> > 3) Querying now has special support for lookup-by-id type queries where
>> we
>> > will use HBase Get instead of Scan. We also have good support for
>> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will scan
>> > past the first records on the client side).
>> >
>> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in many
>> > flavours and all are not compatible. I doubt there's a lot we can do
>> about
>> > it, except ask the users to provide their own HBase dependency as per
>> their
>> > backend version. We should probably thus make all our HBase/Hadoop
>> > dependencies <optional>true</optional> in order to not influence the
>> > typical clients.
>> >
>> > Kasper
>> >
>> >
>> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
>> i.am.kasper.sorensen@gmail.com>:
>> >
>> >> Hi Henry,
>> >>
>> >> Yea the Phoenix project is definately an interesting approach to making
>> MM
>> >> capable of working with HBase. The only downside to me is that it seems
>> >> they do a lot of intrusive stuff to HBase like creating new index tables
>> >> etc... I would normally not "allow" that for a simple connector.
>> >>
>> >> Maybe we should simply support both styles. And in the case of Phoenix,
>> I
>> >> guess we could simply go through the JDBC module of MetaModel and
>> connect
>> >> via their JDBC driver... Is that maybe a route, do you know?
>> >>
>> >> - Kasper
>> >>
>> >>
>> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >>
>> >> We could use the HBase client library from the store I suppose.
>> >>> The issue I am actually worry is actually adding real query support
>> >>> for column based datastore is kind of big task.
>> >>> Apache Phoenix tried to do that so maybe we could leverage the SQL
>> >>> planner layer to provide the implementation of the query execution to
>> >>> HBase layer?
>> >>>
>> >>> - Henry
>> >>>
>> >>>
>> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
>> >>> <i....@gmail.com> wrote:
>> >>> > Thanks for the input Henry. With your experience, do you then also
>> >>> happen
>> >>> > to know of a good thin client-side library? I imagine that we could
>> >>> maybe
>> >>> > use a REST client instead of the full client we currently use. That
>> >>> would
>> >>> > save us a ton of dependency-overhead I think. Or is it a non-issue in
>> >>> your
>> >>> > mind, since HBase users are used to this overhead?
>> >>> >
>> >>> >
>> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <he...@gmail.com>:
>> >>> >
>> >>> >> For 1 > I think adding read only to HBase should be ok because most
>> >>> >> update to HBase either through HBase client or REST via Stargate [1]
>> >>> >> or Thrift
>> >>> >>
>> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to column and
>> >>> >> generate POJO java via Avro compiler.
>> >>> >>
>> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating
>> try
>> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I think
>> >>> >> this is defeat the purpose of having NoSQL databases that serve
>> >>> >> different purpose than Relational databse.
>> >>> >>
>> >>> >> I am not sure Metamodel should touch NoSQL databases which more like
>> >>> >> column types. These databases are designed for large data with
>> access
>> >>> >> primary via key and not query mechanism.
>> >>> >>
>> >>> >> Just my 2-cent
>> >>> >>
>> >>> >>
>> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
>> >>> >> [2] http://phoenix.incubator.apache.org/
>> >>> >>
>> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
>> >>> >> <i....@gmail.com> wrote:
>> >>> >> > Hi everyone,
>> >>> >> >
>> >>> >> > I was looking at our "hbase-module" branch and as much as I like
>> this
>> >>> >> idea,
>> >>> >> > I think we've been a bit too idle with the branch. Maybe we should
>> >>> try to
>> >>> >> > make something final e.g. for a version 4.1.
>> >>> >> >
>> >>> >> > So I thought to give an overview/status of the module's current
>> >>> >> > capabilities and it's shortcomings. We should figure out if we
>> think
>> >>> this
>> >>> >> > is good enough for a first version, or if we want to do some
>> >>> improvements
>> >>> >> > to the module before adding it to our portfolio of MetaModel
>> modules.
>> >>> >> >
>> >>> >> > 1) The module only offers read-only/query access to HBase. That is
>> >>> in my
>> >>> >> > opinion OK for now, we have several such modules, and this is
>> >>> something
>> >>> >> we
>> >>> >> > can better add later if we straighten out the remaining topics in
>> >>> this
>> >>> >> mail.
>> >>> >> >
>> >>> >> > 2) With regards to metadata mapping: HBase is different because it
>> >>> has
>> >>> >> both
>> >>> >> > column families and in column families there are columns. For the
>> >>> sake of
>> >>> >> > our view on HBase I would describe column families simply as "a
>> >>> logical
>> >>> >> of
>> >>> >> > columns". Column families are fixed within a table, but rows in a
>> >>> table
>> >>> >> may
>> >>> >> > contain arbitrary numbers of columns within each column family.
>> >>> So... You
>> >>> >> > can instantiate the HBaseDataContext in two ways:
>> >>> >> >
>> >>> >> > 2a) You can let MetaModel discover the metadata. This
>> unfortunately
>> >>> has a
>> >>> >> > severe limitation. We discover the table names and column families
>> >>> using
>> >>> >> > the HBase API. But the actual columns and their contents cannot be
>> >>> >> provided
>> >>> >> > by the API. So instead we simply expose the column families with a
>> >>> MAP
>> >>> >> data
>> >>> >> > types. The trouble with this is that the keys and values of the
>> maps
>> >>> will
>> >>> >> > simply be byte-arrays ... Usually not very useful! But it's sort
>> of
>> >>> the
>> >>> >> > only thing (as far as I can see) that's "safe" in HBase, since
>> HBase
>> >>> >> allows
>> >>> >> > anything (byte arrays) in it's columns.
>> >>> >> >
>> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an
>> array
>> >>> of
>> >>> >> > tables (SimpleTableDef). That way the user defines the metadata
>> >>> himself
>> >>> >> and
>> >>> >> > the implementation assumes that it is correct (or else it will
>> >>> break).
>> >>> >> The
>> >>> >> > good thing about this is that the user can define the proper data
>> >>> types
>> >>> >> > etc. for columns. The user defines the column family and column
>> name
>> >>> by
>> >>> >> > setting defining the MetaModel column name as this: "family:name"
>> >>> >> > (consistent with most HBase tools and API calls).
>> >>> >> >
>> >>> >> > 3) With regards to querying: We've implemented basic query
>> >>> capabilities
>> >>> >> > using the MetaModel query postprocessor. But not all queries are
>> very
>> >>> >> > effective... In addition to of course full table scans, we have
>> >>> optimized
>> >>> >> > support of of COUNT queries and of table scans with maxRows.
>> >>> >> >
>> >>> >> > We could rather easily add optimized support for a couple of other
>> >>> >> typical
>> >>> >> > queries:
>> >>> >> >  * lookup record by ID
>> >>> >> >  * paged table scans (both firstRow and maxRows)
>> >>> >> >  * queries with simple filters/where items
>> >>> >> >
>> >>> >> > 4) With regards to dependencies: The module right now depends on
>> the
>> >>> >> > artifact called "hbase-client". This dependency has a loot of
>> >>> transient
>> >>> >> > dependencies so the size of the module is quite extreme. As an
>> >>> example,
>> >>> >> it
>> >>> >> > includes stuff like jetty, jersey, jackson and of course hadoop...
>> >>> But I
>> >>> >> am
>> >>> >> > wondering if we can have a more thin client-side than that! If
>> anyone
>> >>> >> knows
>> >>> >> > if e.g. we can use the REST interface easily or so, that would
>> maybe
>> >>> be
>> >>> >> > better. I'm not an expert on HBase though, so please enlighten me!
>> >>> >> >
>> >>> >> > Kind regards,
>> >>> >> > Kasper
>> >>> >>
>> >>>
>> >>
>> >>
>>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Kasper Sørensen <i....@gmail.com>.
Yep. Or in slightly more technical terms: It means that the
HBaseDataContext only implements DataContext which has these two
significant methods:

 * getSchemas()
 * executeQuery(...)

(Plus a bunch more methods, but those two give you the general impression:
Explore metadata and fire queries / reads)
But not UpdateableDataContext, which has the write operations:

 * executeUpdate(...)

Regards,
Kasper


2014-03-24 22:37 GMT+01:00 Henry Saputra <he...@gmail.com>:

> Hmm, what does it mean by read only? You can use it to read data from
> HBase?
>
> - Henry
>
> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
> <i....@gmail.com> wrote:
> > A quick update on this since the module has now been merged into the
> master
> > branch:
> >
> > 1) Module is still read-only. This is accepted for now (unless someone
> > wants to help change it of course).
> >
> > 2) Metadata mapping is still working in two modes: a) we discover the
> > column families and expose them as byte-array maps (not very useful, but
> > works as a "lowest common denominator") and b) the user provides a set of
> > SimpleTableDef (which now has a convenient parser btw.:)) and gets his
> > table mapping as he wants it.
> >
> > 3) Querying now has special support for lookup-by-id type queries where
> we
> > will use HBase Get instead of Scan. We also have good support for
> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will scan
> > past the first records on the client side).
> >
> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in many
> > flavours and all are not compatible. I doubt there's a lot we can do
> about
> > it, except ask the users to provide their own HBase dependency as per
> their
> > backend version. We should probably thus make all our HBase/Hadoop
> > dependencies <optional>true</optional> in order to not influence the
> > typical clients.
> >
> > Kasper
> >
> >
> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <
> i.am.kasper.sorensen@gmail.com>:
> >
> >> Hi Henry,
> >>
> >> Yea the Phoenix project is definately an interesting approach to making
> MM
> >> capable of working with HBase. The only downside to me is that it seems
> >> they do a lot of intrusive stuff to HBase like creating new index tables
> >> etc... I would normally not "allow" that for a simple connector.
> >>
> >> Maybe we should simply support both styles. And in the case of Phoenix,
> I
> >> guess we could simply go through the JDBC module of MetaModel and
> connect
> >> via their JDBC driver... Is that maybe a route, do you know?
> >>
> >> - Kasper
> >>
> >>
> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >>
> >> We could use the HBase client library from the store I suppose.
> >>> The issue I am actually worry is actually adding real query support
> >>> for column based datastore is kind of big task.
> >>> Apache Phoenix tried to do that so maybe we could leverage the SQL
> >>> planner layer to provide the implementation of the query execution to
> >>> HBase layer?
> >>>
> >>> - Henry
> >>>
> >>>
> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
> >>> <i....@gmail.com> wrote:
> >>> > Thanks for the input Henry. With your experience, do you then also
> >>> happen
> >>> > to know of a good thin client-side library? I imagine that we could
> >>> maybe
> >>> > use a REST client instead of the full client we currently use. That
> >>> would
> >>> > save us a ton of dependency-overhead I think. Or is it a non-issue in
> >>> your
> >>> > mind, since HBase users are used to this overhead?
> >>> >
> >>> >
> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <he...@gmail.com>:
> >>> >
> >>> >> For 1 > I think adding read only to HBase should be ok because most
> >>> >> update to HBase either through HBase client or REST via Stargate [1]
> >>> >> or Thrift
> >>> >>
> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to column and
> >>> >> generate POJO java via Avro compiler.
> >>> >>
> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating
> try
> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I think
> >>> >> this is defeat the purpose of having NoSQL databases that serve
> >>> >> different purpose than Relational databse.
> >>> >>
> >>> >> I am not sure Metamodel should touch NoSQL databases which more like
> >>> >> column types. These databases are designed for large data with
> access
> >>> >> primary via key and not query mechanism.
> >>> >>
> >>> >> Just my 2-cent
> >>> >>
> >>> >>
> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> >>> >> [2] http://phoenix.incubator.apache.org/
> >>> >>
> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> >>> >> <i....@gmail.com> wrote:
> >>> >> > Hi everyone,
> >>> >> >
> >>> >> > I was looking at our "hbase-module" branch and as much as I like
> this
> >>> >> idea,
> >>> >> > I think we've been a bit too idle with the branch. Maybe we should
> >>> try to
> >>> >> > make something final e.g. for a version 4.1.
> >>> >> >
> >>> >> > So I thought to give an overview/status of the module's current
> >>> >> > capabilities and it's shortcomings. We should figure out if we
> think
> >>> this
> >>> >> > is good enough for a first version, or if we want to do some
> >>> improvements
> >>> >> > to the module before adding it to our portfolio of MetaModel
> modules.
> >>> >> >
> >>> >> > 1) The module only offers read-only/query access to HBase. That is
> >>> in my
> >>> >> > opinion OK for now, we have several such modules, and this is
> >>> something
> >>> >> we
> >>> >> > can better add later if we straighten out the remaining topics in
> >>> this
> >>> >> mail.
> >>> >> >
> >>> >> > 2) With regards to metadata mapping: HBase is different because it
> >>> has
> >>> >> both
> >>> >> > column families and in column families there are columns. For the
> >>> sake of
> >>> >> > our view on HBase I would describe column families simply as "a
> >>> logical
> >>> >> of
> >>> >> > columns". Column families are fixed within a table, but rows in a
> >>> table
> >>> >> may
> >>> >> > contain arbitrary numbers of columns within each column family.
> >>> So... You
> >>> >> > can instantiate the HBaseDataContext in two ways:
> >>> >> >
> >>> >> > 2a) You can let MetaModel discover the metadata. This
> unfortunately
> >>> has a
> >>> >> > severe limitation. We discover the table names and column families
> >>> using
> >>> >> > the HBase API. But the actual columns and their contents cannot be
> >>> >> provided
> >>> >> > by the API. So instead we simply expose the column families with a
> >>> MAP
> >>> >> data
> >>> >> > types. The trouble with this is that the keys and values of the
> maps
> >>> will
> >>> >> > simply be byte-arrays ... Usually not very useful! But it's sort
> of
> >>> the
> >>> >> > only thing (as far as I can see) that's "safe" in HBase, since
> HBase
> >>> >> allows
> >>> >> > anything (byte arrays) in it's columns.
> >>> >> >
> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an
> array
> >>> of
> >>> >> > tables (SimpleTableDef). That way the user defines the metadata
> >>> himself
> >>> >> and
> >>> >> > the implementation assumes that it is correct (or else it will
> >>> break).
> >>> >> The
> >>> >> > good thing about this is that the user can define the proper data
> >>> types
> >>> >> > etc. for columns. The user defines the column family and column
> name
> >>> by
> >>> >> > setting defining the MetaModel column name as this: "family:name"
> >>> >> > (consistent with most HBase tools and API calls).
> >>> >> >
> >>> >> > 3) With regards to querying: We've implemented basic query
> >>> capabilities
> >>> >> > using the MetaModel query postprocessor. But not all queries are
> very
> >>> >> > effective... In addition to of course full table scans, we have
> >>> optimized
> >>> >> > support of of COUNT queries and of table scans with maxRows.
> >>> >> >
> >>> >> > We could rather easily add optimized support for a couple of other
> >>> >> typical
> >>> >> > queries:
> >>> >> >  * lookup record by ID
> >>> >> >  * paged table scans (both firstRow and maxRows)
> >>> >> >  * queries with simple filters/where items
> >>> >> >
> >>> >> > 4) With regards to dependencies: The module right now depends on
> the
> >>> >> > artifact called "hbase-client". This dependency has a loot of
> >>> transient
> >>> >> > dependencies so the size of the module is quite extreme. As an
> >>> example,
> >>> >> it
> >>> >> > includes stuff like jetty, jersey, jackson and of course hadoop...
> >>> But I
> >>> >> am
> >>> >> > wondering if we can have a more thin client-side than that! If
> anyone
> >>> >> knows
> >>> >> > if e.g. we can use the REST interface easily or so, that would
> maybe
> >>> be
> >>> >> > better. I'm not an expert on HBase though, so please enlighten me!
> >>> >> >
> >>> >> > Kind regards,
> >>> >> > Kasper
> >>> >>
> >>>
> >>
> >>
>

Re: [DISCUSS] State of the work-in-progress HBase branch

Posted by Henry Saputra <he...@gmail.com>.
Hmm, what does it mean by read only? You can use it to read data from HBase?

- Henry

On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen
<i....@gmail.com> wrote:
> A quick update on this since the module has now been merged into the master
> branch:
>
> 1) Module is still read-only. This is accepted for now (unless someone
> wants to help change it of course).
>
> 2) Metadata mapping is still working in two modes: a) we discover the
> column families and expose them as byte-array maps (not very useful, but
> works as a "lowest common denominator") and b) the user provides a set of
> SimpleTableDef (which now has a convenient parser btw.:)) and gets his
> table mapping as he wants it.
>
> 3) Querying now has special support for lookup-by-id type queries where we
> will use HBase Get instead of Scan. We also have good support for
> LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will scan
> past the first records on the client side).
>
> 4) Dependencies seems to be a pain still. HBase and Hadoop comes in many
> flavours and all are not compatible. I doubt there's a lot we can do about
> it, except ask the users to provide their own HBase dependency as per their
> backend version. We should probably thus make all our HBase/Hadoop
> dependencies <optional>true</optional> in order to not influence the
> typical clients.
>
> Kasper
>
>
> 2014-02-24 17:08 GMT+01:00 Kasper Sørensen <i....@gmail.com>:
>
>> Hi Henry,
>>
>> Yea the Phoenix project is definately an interesting approach to making MM
>> capable of working with HBase. The only downside to me is that it seems
>> they do a lot of intrusive stuff to HBase like creating new index tables
>> etc... I would normally not "allow" that for a simple connector.
>>
>> Maybe we should simply support both styles. And in the case of Phoenix, I
>> guess we could simply go through the JDBC module of MetaModel and connect
>> via their JDBC driver... Is that maybe a route, do you know?
>>
>> - Kasper
>>
>>
>> 2014-02-24 6:37 GMT+01:00 Henry Saputra <he...@gmail.com>:
>>
>> We could use the HBase client library from the store I suppose.
>>> The issue I am actually worry is actually adding real query support
>>> for column based datastore is kind of big task.
>>> Apache Phoenix tried to do that so maybe we could leverage the SQL
>>> planner layer to provide the implementation of the query execution to
>>> HBase layer?
>>>
>>> - Henry
>>>
>>>
>>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
>>> <i....@gmail.com> wrote:
>>> > Thanks for the input Henry. With your experience, do you then also
>>> happen
>>> > to know of a good thin client-side library? I imagine that we could
>>> maybe
>>> > use a REST client instead of the full client we currently use. That
>>> would
>>> > save us a ton of dependency-overhead I think. Or is it a non-issue in
>>> your
>>> > mind, since HBase users are used to this overhead?
>>> >
>>> >
>>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <he...@gmail.com>:
>>> >
>>> >> For 1 > I think adding read only to HBase should be ok because most
>>> >> update to HBase either through HBase client or REST via Stargate [1]
>>> >> or Thrift
>>> >>
>>> >> For 2 > In Apache Gora we use Avro to do type mapping to column and
>>> >> generate POJO java via Avro compiler.
>>> >>
>>> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating try
>>> >> to provide SQL to HBase [2] via extra indexing and caching. I think
>>> >> this is defeat the purpose of having NoSQL databases that serve
>>> >> different purpose than Relational databse.
>>> >>
>>> >> I am not sure Metamodel should touch NoSQL databases which more like
>>> >> column types. These databases are designed for large data with access
>>> >> primary via key and not query mechanism.
>>> >>
>>> >> Just my 2-cent
>>> >>
>>> >>
>>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
>>> >> [2] http://phoenix.incubator.apache.org/
>>> >>
>>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
>>> >> <i....@gmail.com> wrote:
>>> >> > Hi everyone,
>>> >> >
>>> >> > I was looking at our "hbase-module" branch and as much as I like this
>>> >> idea,
>>> >> > I think we've been a bit too idle with the branch. Maybe we should
>>> try to
>>> >> > make something final e.g. for a version 4.1.
>>> >> >
>>> >> > So I thought to give an overview/status of the module's current
>>> >> > capabilities and it's shortcomings. We should figure out if we think
>>> this
>>> >> > is good enough for a first version, or if we want to do some
>>> improvements
>>> >> > to the module before adding it to our portfolio of MetaModel modules.
>>> >> >
>>> >> > 1) The module only offers read-only/query access to HBase. That is
>>> in my
>>> >> > opinion OK for now, we have several such modules, and this is
>>> something
>>> >> we
>>> >> > can better add later if we straighten out the remaining topics in
>>> this
>>> >> mail.
>>> >> >
>>> >> > 2) With regards to metadata mapping: HBase is different because it
>>> has
>>> >> both
>>> >> > column families and in column families there are columns. For the
>>> sake of
>>> >> > our view on HBase I would describe column families simply as "a
>>> logical
>>> >> of
>>> >> > columns". Column families are fixed within a table, but rows in a
>>> table
>>> >> may
>>> >> > contain arbitrary numbers of columns within each column family.
>>> So... You
>>> >> > can instantiate the HBaseDataContext in two ways:
>>> >> >
>>> >> > 2a) You can let MetaModel discover the metadata. This unfortunately
>>> has a
>>> >> > severe limitation. We discover the table names and column families
>>> using
>>> >> > the HBase API. But the actual columns and their contents cannot be
>>> >> provided
>>> >> > by the API. So instead we simply expose the column families with a
>>> MAP
>>> >> data
>>> >> > types. The trouble with this is that the keys and values of the maps
>>> will
>>> >> > simply be byte-arrays ... Usually not very useful! But it's sort of
>>> the
>>> >> > only thing (as far as I can see) that's "safe" in HBase, since HBase
>>> >> allows
>>> >> > anything (byte arrays) in it's columns.
>>> >> >
>>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array
>>> of
>>> >> > tables (SimpleTableDef). That way the user defines the metadata
>>> himself
>>> >> and
>>> >> > the implementation assumes that it is correct (or else it will
>>> break).
>>> >> The
>>> >> > good thing about this is that the user can define the proper data
>>> types
>>> >> > etc. for columns. The user defines the column family and column name
>>> by
>>> >> > setting defining the MetaModel column name as this: "family:name"
>>> >> > (consistent with most HBase tools and API calls).
>>> >> >
>>> >> > 3) With regards to querying: We've implemented basic query
>>> capabilities
>>> >> > using the MetaModel query postprocessor. But not all queries are very
>>> >> > effective... In addition to of course full table scans, we have
>>> optimized
>>> >> > support of of COUNT queries and of table scans with maxRows.
>>> >> >
>>> >> > We could rather easily add optimized support for a couple of other
>>> >> typical
>>> >> > queries:
>>> >> >  * lookup record by ID
>>> >> >  * paged table scans (both firstRow and maxRows)
>>> >> >  * queries with simple filters/where items
>>> >> >
>>> >> > 4) With regards to dependencies: The module right now depends on the
>>> >> > artifact called "hbase-client". This dependency has a loot of
>>> transient
>>> >> > dependencies so the size of the module is quite extreme. As an
>>> example,
>>> >> it
>>> >> > includes stuff like jetty, jersey, jackson and of course hadoop...
>>> But I
>>> >> am
>>> >> > wondering if we can have a more thin client-side than that! If anyone
>>> >> knows
>>> >> > if e.g. we can use the REST interface easily or so, that would maybe
>>> be
>>> >> > better. I'm not an expert on HBase though, so please enlighten me!
>>> >> >
>>> >> > Kind regards,
>>> >> > Kasper
>>> >>
>>>
>>
>>