You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Tarandeep Singh <ta...@gmail.com> on 2009/03/02 20:33:26 UTC

Announcing CloudBase-1.2.1 release

Hi,

We have just released 1.2.1 version of CloudBase on sourceforge-
http://cloudbase.sourceforge.net

[ CloudBase is a data warehouse system built on top of Hadoop's Map-Reduce
architecture. It uses ANSI SQL as its query language and comes with a JDBC
driver. It is developed by Business.com and is released to open source
community under GNU GPL license]

This release fixes one issue with the 1.2 release- Table Indexing feature
was not enabled in the 1.2 release. This release fixes this issue.

Also we have updated the svn repository on the sourceforge site and we
invite contributors to work with us to improve CloudBase. The svn repository
url is-
https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk

We will be uploading Developer's guide/documentation on the CloudBase
website very soon. Meanwhile, if someone wants to try compiling the code and
play around with it, please contact me, I can help you get started.

Thanks,
Taran

Re: Announcing CloudBase-1.2.1 release

Posted by tim robertson <ti...@gmail.com>.

Thanks Taran for the detailed explanation - makes perfect sense.




2009/3/3 Tarandeep Singh <ta...@gmail.com>:
> Hi Tim,
>
> I am currently writing developer's guide for CloudBase. It will explain the
> CloudBase design in detail as well as the algorithms used to do Joins,
> Indexing etc.
>
> However, for indexing here is a brief introduction which you can also find
> on the CloudBase website-
> http://cloudbase.sourceforge.net/index.html#userDoc
>
> -------------------------------------------------------------------
> CloudBase supports Hash Indexing on any column of a table. Index can be
> created on multiple columns as well (separate index for each column). During
> indexing process, CloudBase creates an Inverted Index for column values. An
> Inverted Index can be viewed as a persistent HashMap where keys are the
> column values and values are the lists containing information about Data
> Blocks containing the column values.
>
> CloudBase also stores the MIN and MAX value of a column (if its type is
> numeric) present in a Data Block. These values are useful for range queries.
>
> During query execution process, if an indexed column is present in a WHERE
> clause (for example WHERE c1 = 100 or c2 > 100) then CloudBase will locate
> all those Data Blocks (from its index data) where these column values are
> present and load only those Data Blocks instead of loading all Data Blocks.
>
> This indexing is very useful when query involves filtering on date or date
> range and we have seen performance improvements of up to 97% in such cases.
> -------------------------------------------------------------------
>
>
> For Joins, I have not explained the algorithm in the documentation, however
> I am planning to do it so very soon. However, the process is explained in
> the code comments. For your convenience I am pasting the comment from the
> code here-
>
> ----------------------------------------------------
>  /**
>   * Executes Map/Reduce job to handle join (inner or outer join) between
> two
>   * tables/subqueries.
>   * <p>
>   * It uses semi-join approach to optimize the join process. Join algorithm
> for
>   * inner join is as follows- first smaller table is reduced using the
> columns
>   * participating in join condition as Keys (for MR job). Also,
> bloom-filter
>   * for columns participating in join is constructed which is used to
> filter
>   * out the rows of bigger table during the map process(later). Bigger
> table is also
>   * reduced using the join columns as Keys. When rows of bigger table reach
>   * reducers, rows of smaller table are read back from HDFS and join is
>   * performed.
>   *<p>
>   * This algorithm makes use of one property of Map-Reduce paradigm - Same
> keys
>   * go to one reducer. So this means, rows (of both tables) having same
> values
>   * of columns participating in join will go to same reducer (as join
> columns
>   * are used as Keys) and hence a join can be easily performed at reducer.
>   * Although, it seems reading rows of smaller table from HDFS and
> performing
>   * join will be in-efficient, it is not. There are two reasons for the
> same-
>   * first, Hadoop tries to store a copy of the data at the node which
> produced
>   * that data. So this means, a reducer will be reading a local copy of the
>
>   * rows of smaller table. Second reason is, a Map-Reduce job, sort the
> keys
>   * so this means, rows of both smaller and bigger table are sorted on the
>   * join columns. This makes it easier to read rows and perform join.
>   * <p>
>   * In case of left outer join, left table is always reduced first. In case
>   * of right outer join, right table is reduced first.
>   *
>   * @see http://en.wikipedia.org/wiki/Relational_algebra#Semijoin
>   * @see http://en.wikipedia.org/wiki/Bloom_filter
>   */
> ----------------------------------------------------
>
> Thanks,
> Taran
>
> 2009/3/3 tim robertson <ti...@gmail.com>
>
>> Hi Taran,
>>
>> Have you a blog or something that explains how you process joins on
>> cloudbase?
>>
>> E.g. how are indexes used, and how do you go through the joining using
>> the data files and index files.
>> Do you look at all possible indexes, determine the cardinality of each
>> and from this pick a join order, or do you start at both ends of the
>> query and meet in the middle (if you know what I mean...).  This
>> really influences schema design for large datasets in mysql (e.g. you
>> need to store your own cardinality as mysql can't determine the best
>> join order inherently) so I am wondering about porting my reporting
>> application.
>>
>> I think this kind of info would be great for cloudbase docs.
>>
>> Cheers,
>>
>> Tim
>>
>>
>> 2009/3/3 Tarandeep Singh <ta...@gmail.com>:
>> > Tim is right. CloudBase is not equivalent to HBase.
>> >
>> > HBase is column oriented database based on Google's BigTable. CloudBase
>> is a
>> > database/data warehosue layer on top of Hadoop and by means of its SQL
>> > interface makes it easier to mine logs. So instead of writing Map-Reduce
>> > jobs for analyzing data, one can use SQL to do the same and SQL to Map
>> > Reduce job translation is handled by CloudBase.
>> >
>> > -Taran
>> >
>> > 2009/3/3 tim robertson <ti...@gmail.com>
>> >
>> >> Hi Praveen,
>> >>
>> >> I think it is more equivalent to Hive than HBase - both offer joins
>> >> and structured querying where HBase is more a column oriented data
>> >> store with many to ones embedded in a single row and (currently) only
>> >> indexes on the primary key, but secondary keys are coming.  I
>> >> anticipate using HBase as a back end to harvest into, but might make
>> >> use of Hive or Cloudbase for ad hoc reporting when needed.
>> >>
>> >> Has anyone done any testing of Hive vs. Cloudbase for performance and
>> >> comparison of features?
>> >>
>> >> Cheers,
>> >>
>> >> Tim
>> >>
>> >>
>> >> 2009/3/3 Guttikonda, Praveen <pr...@hp.com>:
>> >> > Hi ,
>> >> > Will this be competing in a sense with HBASE then ?
>> >> >
>> >> > Cheers,
>> >> > Praveen
>> >> >
>> >> > -----Original Message-----
>> >> > From: Tarandeep Singh [mailto:tarandeep@gmail.com]
>> >> > Sent: Tuesday, March 03, 2009 10:12 PM
>> >> > To: core-user@hadoop.apache.org
>> >> > Subject: Re: Announcing CloudBase-1.2.1 release
>> >> >
>> >> > Hi Lukas,
>> >> >
>> >> > Yes, you are right. As of now, CloudBase does not support unique keys
>> and
>> >> foreign keys on tables. CloudBase is designed as a database abstraction
>> >> layer on top of Hadoop, thus making it easier to query/mine logs/huge
>> data
>> >> easily.
>> >> >
>> >> > -Taran
>> >> >
>> >> >
>> >> > On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Hi Taran,
>> >> >> This looks impressive. I quickly looked at the documentation, am I
>> >> >> right that it does not support unique keys and foreign keys for
>> tables?
>> >> >>
>> >> >> Regards,
>> >> >> Lukas
>> >> >>
>> >> >> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <tarandeep@gmail.com
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > Hi,
>> >> >> >
>> >> >> > We have just released 1.2.1 version of CloudBase on sourceforge-
>> >> >> > http://cloudbase.sourceforge.net
>> >> >> >
>> >> >> > [ CloudBase is a data warehouse system built on top of Hadoop's
>> >> >> Map-Reduce
>> >> >> > architecture. It uses ANSI SQL as its query language and comes with
>> >> >> > a
>> >> >> JDBC
>> >> >> > driver. It is developed by Business.com and is released to open
>> >> >> > source community under GNU GPL license]
>> >> >> >
>> >> >> > This release fixes one issue with the 1.2 release- Table Indexing
>> >> >> > feature was not enabled in the 1.2 release. This release fixes this
>> >> issue.
>> >> >> >
>> >> >> > Also we have updated the svn repository on the sourceforge site and
>> >> >> > we invite contributors to work with us to improve CloudBase. The
>> svn
>> >> >> > repository url is-
>> >> >> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
>> >> >> >
>> >> >> > We will be uploading Developer's guide/documentation on the
>> >> >> > CloudBase website very soon. Meanwhile, if someone wants to try
>> >> >> > compiling the code and play around with it, please contact me, I
>> can
>> >> >> > help you get started.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Taran
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> http://blog.lukas-vlcek.com/
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Announcing CloudBase-1.2.1 release

Posted by Tarandeep Singh <ta...@gmail.com>.

Hi Tim,

I am currently writing developer's guide for CloudBase. It will explain the
CloudBase design in detail as well as the algorithms used to do Joins,
Indexing etc.

However, for indexing here is a brief introduction which you can also find
on the CloudBase website-
http://cloudbase.sourceforge.net/index.html#userDoc

-------------------------------------------------------------------
CloudBase supports Hash Indexing on any column of a table. Index can be
created on multiple columns as well (separate index for each column). During
indexing process, CloudBase creates an Inverted Index for column values. An
Inverted Index can be viewed as a persistent HashMap where keys are the
column values and values are the lists containing information about Data
Blocks containing the column values.

CloudBase also stores the MIN and MAX value of a column (if its type is
numeric) present in a Data Block. These values are useful for range queries.

During query execution process, if an indexed column is present in a WHERE
clause (for example WHERE c1 = 100 or c2 > 100) then CloudBase will locate
all those Data Blocks (from its index data) where these column values are
present and load only those Data Blocks instead of loading all Data Blocks.

This indexing is very useful when query involves filtering on date or date
range and we have seen performance improvements of up to 97% in such cases.
-------------------------------------------------------------------


For Joins, I have not explained the algorithm in the documentation, however
I am planning to do it so very soon. However, the process is explained in
the code comments. For your convenience I am pasting the comment from the
code here-

----------------------------------------------------
  /**
   * Executes Map/Reduce job to handle join (inner or outer join) between
two
   * tables/subqueries.
   * <p>
   * It uses semi-join approach to optimize the join process. Join algorithm
for
   * inner join is as follows- first smaller table is reduced using the
columns
   * participating in join condition as Keys (for MR job). Also,
bloom-filter
   * for columns participating in join is constructed which is used to
filter
   * out the rows of bigger table during the map process(later). Bigger
table is also
   * reduced using the join columns as Keys. When rows of bigger table reach
   * reducers, rows of smaller table are read back from HDFS and join is
   * performed.
   *<p>
   * This algorithm makes use of one property of Map-Reduce paradigm - Same
keys
   * go to one reducer. So this means, rows (of both tables) having same
values
   * of columns participating in join will go to same reducer (as join
columns
   * are used as Keys) and hence a join can be easily performed at reducer.
   * Although, it seems reading rows of smaller table from HDFS and
performing
   * join will be in-efficient, it is not. There are two reasons for the
same-
   * first, Hadoop tries to store a copy of the data at the node which
produced
   * that data. So this means, a reducer will be reading a local copy of the

   * rows of smaller table. Second reason is, a Map-Reduce job, sort the
keys
   * so this means, rows of both smaller and bigger table are sorted on the
   * join columns. This makes it easier to read rows and perform join.
   * <p>
   * In case of left outer join, left table is always reduced first. In case
   * of right outer join, right table is reduced first.
   *
   * @see http://en.wikipedia.org/wiki/Relational_algebra#Semijoin
   * @see http://en.wikipedia.org/wiki/Bloom_filter
   */
----------------------------------------------------

Thanks,
Taran

2009/3/3 tim robertson <ti...@gmail.com>

> Hi Taran,
>
> Have you a blog or something that explains how you process joins on
> cloudbase?
>
> E.g. how are indexes used, and how do you go through the joining using
> the data files and index files.
> Do you look at all possible indexes, determine the cardinality of each
> and from this pick a join order, or do you start at both ends of the
> query and meet in the middle (if you know what I mean...).  This
> really influences schema design for large datasets in mysql (e.g. you
> need to store your own cardinality as mysql can't determine the best
> join order inherently) so I am wondering about porting my reporting
> application.
>
> I think this kind of info would be great for cloudbase docs.
>
> Cheers,
>
> Tim
>
>
> 2009/3/3 Tarandeep Singh <ta...@gmail.com>:
> > Tim is right. CloudBase is not equivalent to HBase.
> >
> > HBase is column oriented database based on Google's BigTable. CloudBase
> is a
> > database/data warehosue layer on top of Hadoop and by means of its SQL
> > interface makes it easier to mine logs. So instead of writing Map-Reduce
> > jobs for analyzing data, one can use SQL to do the same and SQL to Map
> > Reduce job translation is handled by CloudBase.
> >
> > -Taran
> >
> > 2009/3/3 tim robertson <ti...@gmail.com>
> >
> >> Hi Praveen,
> >>
> >> I think it is more equivalent to Hive than HBase - both offer joins
> >> and structured querying where HBase is more a column oriented data
> >> store with many to ones embedded in a single row and (currently) only
> >> indexes on the primary key, but secondary keys are coming.  I
> >> anticipate using HBase as a back end to harvest into, but might make
> >> use of Hive or Cloudbase for ad hoc reporting when needed.
> >>
> >> Has anyone done any testing of Hive vs. Cloudbase for performance and
> >> comparison of features?
> >>
> >> Cheers,
> >>
> >> Tim
> >>
> >>
> >> 2009/3/3 Guttikonda, Praveen <pr...@hp.com>:
> >> > Hi ,
> >> > Will this be competing in a sense with HBASE then ?
> >> >
> >> > Cheers,
> >> > Praveen
> >> >
> >> > -----Original Message-----
> >> > From: Tarandeep Singh [mailto:tarandeep@gmail.com]
> >> > Sent: Tuesday, March 03, 2009 10:12 PM
> >> > To: core-user@hadoop.apache.org
> >> > Subject: Re: Announcing CloudBase-1.2.1 release
> >> >
> >> > Hi Lukas,
> >> >
> >> > Yes, you are right. As of now, CloudBase does not support unique keys
> and
> >> foreign keys on tables. CloudBase is designed as a database abstraction
> >> layer on top of Hadoop, thus making it easier to query/mine logs/huge
> data
> >> easily.
> >> >
> >> > -Taran
> >> >
> >> >
> >> > On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi Taran,
> >> >> This looks impressive. I quickly looked at the documentation, am I
> >> >> right that it does not support unique keys and foreign keys for
> tables?
> >> >>
> >> >> Regards,
> >> >> Lukas
> >> >>
> >> >> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <tarandeep@gmail.com
> >
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > We have just released 1.2.1 version of CloudBase on sourceforge-
> >> >> > http://cloudbase.sourceforge.net
> >> >> >
> >> >> > [ CloudBase is a data warehouse system built on top of Hadoop's
> >> >> Map-Reduce
> >> >> > architecture. It uses ANSI SQL as its query language and comes with
> >> >> > a
> >> >> JDBC
> >> >> > driver. It is developed by Business.com and is released to open
> >> >> > source community under GNU GPL license]
> >> >> >
> >> >> > This release fixes one issue with the 1.2 release- Table Indexing
> >> >> > feature was not enabled in the 1.2 release. This release fixes this
> >> issue.
> >> >> >
> >> >> > Also we have updated the svn repository on the sourceforge site and
> >> >> > we invite contributors to work with us to improve CloudBase. The
> svn
> >> >> > repository url is-
> >> >> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
> >> >> >
> >> >> > We will be uploading Developer's guide/documentation on the
> >> >> > CloudBase website very soon. Meanwhile, if someone wants to try
> >> >> > compiling the code and play around with it, please contact me, I
> can
> >> >> > help you get started.
> >> >> >
> >> >> > Thanks,
> >> >> > Taran
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://blog.lukas-vlcek.com/
> >> >>
> >> >
> >>
> >
>

Re: Announcing CloudBase-1.2.1 release

Posted by tim robertson <ti...@gmail.com>.

Hi Taran,

Have you a blog or something that explains how you process joins on cloudbase?

E.g. how are indexes used, and how do you go through the joining using
the data files and index files.
Do you look at all possible indexes, determine the cardinality of each
and from this pick a join order, or do you start at both ends of the
query and meet in the middle (if you know what I mean...).  This
really influences schema design for large datasets in mysql (e.g. you
need to store your own cardinality as mysql can't determine the best
join order inherently) so I am wondering about porting my reporting
application.

I think this kind of info would be great for cloudbase docs.

Cheers,

Tim


2009/3/3 Tarandeep Singh <ta...@gmail.com>:
> Tim is right. CloudBase is not equivalent to HBase.
>
> HBase is column oriented database based on Google's BigTable. CloudBase is a
> database/data warehosue layer on top of Hadoop and by means of its SQL
> interface makes it easier to mine logs. So instead of writing Map-Reduce
> jobs for analyzing data, one can use SQL to do the same and SQL to Map
> Reduce job translation is handled by CloudBase.
>
> -Taran
>
> 2009/3/3 tim robertson <ti...@gmail.com>
>
>> Hi Praveen,
>>
>> I think it is more equivalent to Hive than HBase - both offer joins
>> and structured querying where HBase is more a column oriented data
>> store with many to ones embedded in a single row and (currently) only
>> indexes on the primary key, but secondary keys are coming.  I
>> anticipate using HBase as a back end to harvest into, but might make
>> use of Hive or Cloudbase for ad hoc reporting when needed.
>>
>> Has anyone done any testing of Hive vs. Cloudbase for performance and
>> comparison of features?
>>
>> Cheers,
>>
>> Tim
>>
>>
>> 2009/3/3 Guttikonda, Praveen <pr...@hp.com>:
>> > Hi ,
>> > Will this be competing in a sense with HBASE then ?
>> >
>> > Cheers,
>> > Praveen
>> >
>> > -----Original Message-----
>> > From: Tarandeep Singh [mailto:tarandeep@gmail.com]
>> > Sent: Tuesday, March 03, 2009 10:12 PM
>> > To: core-user@hadoop.apache.org
>> > Subject: Re: Announcing CloudBase-1.2.1 release
>> >
>> > Hi Lukas,
>> >
>> > Yes, you are right. As of now, CloudBase does not support unique keys and
>> foreign keys on tables. CloudBase is designed as a database abstraction
>> layer on top of Hadoop, thus making it easier to query/mine logs/huge data
>> easily.
>> >
>> > -Taran
>> >
>> >
>> > On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com>
>> wrote:
>> >
>> >> Hi Taran,
>> >> This looks impressive. I quickly looked at the documentation, am I
>> >> right that it does not support unique keys and foreign keys for tables?
>> >>
>> >> Regards,
>> >> Lukas
>> >>
>> >> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <ta...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > We have just released 1.2.1 version of CloudBase on sourceforge-
>> >> > http://cloudbase.sourceforge.net
>> >> >
>> >> > [ CloudBase is a data warehouse system built on top of Hadoop's
>> >> Map-Reduce
>> >> > architecture. It uses ANSI SQL as its query language and comes with
>> >> > a
>> >> JDBC
>> >> > driver. It is developed by Business.com and is released to open
>> >> > source community under GNU GPL license]
>> >> >
>> >> > This release fixes one issue with the 1.2 release- Table Indexing
>> >> > feature was not enabled in the 1.2 release. This release fixes this
>> issue.
>> >> >
>> >> > Also we have updated the svn repository on the sourceforge site and
>> >> > we invite contributors to work with us to improve CloudBase. The svn
>> >> > repository url is-
>> >> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
>> >> >
>> >> > We will be uploading Developer's guide/documentation on the
>> >> > CloudBase website very soon. Meanwhile, if someone wants to try
>> >> > compiling the code and play around with it, please contact me, I can
>> >> > help you get started.
>> >> >
>> >> > Thanks,
>> >> > Taran
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> http://blog.lukas-vlcek.com/
>> >>
>> >
>>
>

Re: Announcing CloudBase-1.2.1 release

Posted by Tarandeep Singh <ta...@gmail.com>.

Tim is right. CloudBase is not equivalent to HBase.

HBase is column oriented database based on Google's BigTable. CloudBase is a
database/data warehosue layer on top of Hadoop and by means of its SQL
interface makes it easier to mine logs. So instead of writing Map-Reduce
jobs for analyzing data, one can use SQL to do the same and SQL to Map
Reduce job translation is handled by CloudBase.

-Taran

2009/3/3 tim robertson <ti...@gmail.com>

> Hi Praveen,
>
> I think it is more equivalent to Hive than HBase - both offer joins
> and structured querying where HBase is more a column oriented data
> store with many to ones embedded in a single row and (currently) only
> indexes on the primary key, but secondary keys are coming.  I
> anticipate using HBase as a back end to harvest into, but might make
> use of Hive or Cloudbase for ad hoc reporting when needed.
>
> Has anyone done any testing of Hive vs. Cloudbase for performance and
> comparison of features?
>
> Cheers,
>
> Tim
>
>
> 2009/3/3 Guttikonda, Praveen <pr...@hp.com>:
> > Hi ,
> > Will this be competing in a sense with HBASE then ?
> >
> > Cheers,
> > Praveen
> >
> > -----Original Message-----
> > From: Tarandeep Singh [mailto:tarandeep@gmail.com]
> > Sent: Tuesday, March 03, 2009 10:12 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Announcing CloudBase-1.2.1 release
> >
> > Hi Lukas,
> >
> > Yes, you are right. As of now, CloudBase does not support unique keys and
> foreign keys on tables. CloudBase is designed as a database abstraction
> layer on top of Hadoop, thus making it easier to query/mine logs/huge data
> easily.
> >
> > -Taran
> >
> >
> > On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com>
> wrote:
> >
> >> Hi Taran,
> >> This looks impressive. I quickly looked at the documentation, am I
> >> right that it does not support unique keys and foreign keys for tables?
> >>
> >> Regards,
> >> Lukas
> >>
> >> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <ta...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We have just released 1.2.1 version of CloudBase on sourceforge-
> >> > http://cloudbase.sourceforge.net
> >> >
> >> > [ CloudBase is a data warehouse system built on top of Hadoop's
> >> Map-Reduce
> >> > architecture. It uses ANSI SQL as its query language and comes with
> >> > a
> >> JDBC
> >> > driver. It is developed by Business.com and is released to open
> >> > source community under GNU GPL license]
> >> >
> >> > This release fixes one issue with the 1.2 release- Table Indexing
> >> > feature was not enabled in the 1.2 release. This release fixes this
> issue.
> >> >
> >> > Also we have updated the svn repository on the sourceforge site and
> >> > we invite contributors to work with us to improve CloudBase. The svn
> >> > repository url is-
> >> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
> >> >
> >> > We will be uploading Developer's guide/documentation on the
> >> > CloudBase website very soon. Meanwhile, if someone wants to try
> >> > compiling the code and play around with it, please contact me, I can
> >> > help you get started.
> >> >
> >> > Thanks,
> >> > Taran
> >> >
> >>
> >>
> >>
> >> --
> >> http://blog.lukas-vlcek.com/
> >>
> >
>

Re: Announcing CloudBase-1.2.1 release

Posted by tim robertson <ti...@gmail.com>.

Hi Praveen,

I think it is more equivalent to Hive than HBase - both offer joins
and structured querying where HBase is more a column oriented data
store with many to ones embedded in a single row and (currently) only
indexes on the primary key, but secondary keys are coming.  I
anticipate using HBase as a back end to harvest into, but might make
use of Hive or Cloudbase for ad hoc reporting when needed.

Has anyone done any testing of Hive vs. Cloudbase for performance and
comparison of features?

Cheers,

Tim


2009/3/3 Guttikonda, Praveen <pr...@hp.com>:
> Hi ,
> Will this be competing in a sense with HBASE then ?
>
> Cheers,
> Praveen
>
> -----Original Message-----
> From: Tarandeep Singh [mailto:tarandeep@gmail.com]
> Sent: Tuesday, March 03, 2009 10:12 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Announcing CloudBase-1.2.1 release
>
> Hi Lukas,
>
> Yes, you are right. As of now, CloudBase does not support unique keys and foreign keys on tables. CloudBase is designed as a database abstraction layer on top of Hadoop, thus making it easier to query/mine logs/huge data easily.
>
> -Taran
>
>
> On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com> wrote:
>
>> Hi Taran,
>> This looks impressive. I quickly looked at the documentation, am I
>> right that it does not support unique keys and foreign keys for tables?
>>
>> Regards,
>> Lukas
>>
>> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <ta...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > We have just released 1.2.1 version of CloudBase on sourceforge-
>> > http://cloudbase.sourceforge.net
>> >
>> > [ CloudBase is a data warehouse system built on top of Hadoop's
>> Map-Reduce
>> > architecture. It uses ANSI SQL as its query language and comes with
>> > a
>> JDBC
>> > driver. It is developed by Business.com and is released to open
>> > source community under GNU GPL license]
>> >
>> > This release fixes one issue with the 1.2 release- Table Indexing
>> > feature was not enabled in the 1.2 release. This release fixes this issue.
>> >
>> > Also we have updated the svn repository on the sourceforge site and
>> > we invite contributors to work with us to improve CloudBase. The svn
>> > repository url is-
>> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
>> >
>> > We will be uploading Developer's guide/documentation on the
>> > CloudBase website very soon. Meanwhile, if someone wants to try
>> > compiling the code and play around with it, please contact me, I can
>> > help you get started.
>> >
>> > Thanks,
>> > Taran
>> >
>>
>>
>>
>> --
>> http://blog.lukas-vlcek.com/
>>
>

RE: Announcing CloudBase-1.2.1 release

Posted by "Guttikonda, Praveen" <pr...@hp.com>.

Hi ,
Will this be competing in a sense with HBASE then ?

Cheers,
Praveen

-----Original Message-----
From: Tarandeep Singh [mailto:tarandeep@gmail.com]
Sent: Tuesday, March 03, 2009 10:12 PM
To: core-user@hadoop.apache.org
Subject: Re: Announcing CloudBase-1.2.1 release

Hi Lukas,

Yes, you are right. As of now, CloudBase does not support unique keys and foreign keys on tables. CloudBase is designed as a database abstraction layer on top of Hadoop, thus making it easier to query/mine logs/huge data easily.

-Taran


On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com> wrote:

> Hi Taran,
> This looks impressive. I quickly looked at the documentation, am I
> right that it does not support unique keys and foreign keys for tables?
>
> Regards,
> Lukas
>
> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <ta...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have just released 1.2.1 version of CloudBase on sourceforge-
> > http://cloudbase.sourceforge.net
> >
> > [ CloudBase is a data warehouse system built on top of Hadoop's
> Map-Reduce
> > architecture. It uses ANSI SQL as its query language and comes with
> > a
> JDBC
> > driver. It is developed by Business.com and is released to open
> > source community under GNU GPL license]
> >
> > This release fixes one issue with the 1.2 release- Table Indexing
> > feature was not enabled in the 1.2 release. This release fixes this issue.
> >
> > Also we have updated the svn repository on the sourceforge site and
> > we invite contributors to work with us to improve CloudBase. The svn
> > repository url is-
> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
> >
> > We will be uploading Developer's guide/documentation on the
> > CloudBase website very soon. Meanwhile, if someone wants to try
> > compiling the code and play around with it, please contact me, I can
> > help you get started.
> >
> > Thanks,
> > Taran
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>

Re: Announcing CloudBase-1.2.1 release

Posted by Tarandeep Singh <ta...@gmail.com>.

Hi Lukas,

Yes, you are right. As of now, CloudBase does not support unique keys and
foreign keys on tables. CloudBase is designed as a database abstraction
layer on top of Hadoop, thus making it easier to query/mine logs/huge data
easily.

-Taran


On Tue, Mar 3, 2009 at 1:15 AM, Lukáš Vlček <lu...@gmail.com> wrote:

> Hi Taran,
> This looks impressive. I quickly looked at the documentation, am I right
> that it does not support unique keys and foreign keys for tables?
>
> Regards,
> Lukas
>
> On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <ta...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have just released 1.2.1 version of CloudBase on sourceforge-
> > http://cloudbase.sourceforge.net
> >
> > [ CloudBase is a data warehouse system built on top of Hadoop's
> Map-Reduce
> > architecture. It uses ANSI SQL as its query language and comes with a
> JDBC
> > driver. It is developed by Business.com and is released to open source
> > community under GNU GPL license]
> >
> > This release fixes one issue with the 1.2 release- Table Indexing feature
> > was not enabled in the 1.2 release. This release fixes this issue.
> >
> > Also we have updated the svn repository on the sourceforge site and we
> > invite contributors to work with us to improve CloudBase. The svn
> > repository
> > url is-
> > https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
> >
> > We will be uploading Developer's guide/documentation on the CloudBase
> > website very soon. Meanwhile, if someone wants to try compiling the code
> > and
> > play around with it, please contact me, I can help you get started.
> >
> > Thanks,
> > Taran
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>

Re: Announcing CloudBase-1.2.1 release

Posted by Lukáš Vlček <lu...@gmail.com>.

Hi Taran,
This looks impressive. I quickly looked at the documentation, am I right
that it does not support unique keys and foreign keys for tables?

Regards,
Lukas

On Mon, Mar 2, 2009 at 8:33 PM, Tarandeep Singh <ta...@gmail.com> wrote:

> Hi,
>
> We have just released 1.2.1 version of CloudBase on sourceforge-
> http://cloudbase.sourceforge.net
>
> [ CloudBase is a data warehouse system built on top of Hadoop's Map-Reduce
> architecture. It uses ANSI SQL as its query language and comes with a JDBC
> driver. It is developed by Business.com and is released to open source
> community under GNU GPL license]
>
> This release fixes one issue with the 1.2 release- Table Indexing feature
> was not enabled in the 1.2 release. This release fixes this issue.
>
> Also we have updated the svn repository on the sourceforge site and we
> invite contributors to work with us to improve CloudBase. The svn
> repository
> url is-
> https://cloudbase.svn.sourceforge.net/svnroot/cloudbase/trunk
>
> We will be uploading Developer's guide/documentation on the CloudBase
> website very soon. Meanwhile, if someone wants to try compiling the code
> and
> play around with it, please contact me, I can help you get started.
>
> Thanks,
> Taran
>



-- 
http://blog.lukas-vlcek.com/