You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Aaron Kimball <ak...@gmail.com> on 2012/11/14 23:18:36 UTC

Announcing KijiSchema for HBase schema management

HBase fans,

I’m writing to announce the first release of KijiSchema, a new project to
help developers build applications on HBase. You can download it at
www.kiji.org. It is open source and published under the Apache 2 license.

KijiSchema simplifies the development of applications on HBase by providing
developer-friendly Java APIs for storing and managing typed data using Avro.

As an application grows, developers can gracefully evolve the application
schema at the cell level to handle new fields. These features are
particularly well suited for entity-centric data schemas where all
information about a given entity, including dimensional and transaction
data, is encoded within the same row.

Column names and associations of columns with schemas are maintained in a
data dictionary; developers don’t need to rely on reading source code to
remember where data is stored.

Table schemas can be defined in JSON or by using KijiSchema’s declarative
DDL. Developers can also easily run MapReduce over Kiji tables in HBase
using included MR Input- and OutputFormats.

KijiSchema is an open and highly modular system. It runs on top of an
existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
with no server-side daemons. KijiSchema can also be downloaded as part of a
Kiji BentoBox, which provides a clean install of a mini-cluster of Hadoop,
HBase and Kiji on your laptop in under 15 min. You do not need to have
Hadoop or HBase pre-installed to run the BentoBox.

KijiSchema is inspired by work we have done at WibiData developing
applications for recommendations and personalization on top of HBase. We
will be developing and releasing other components into the Kiji project to
provide additional functionality enabling easy development of data
applications on HBase, including improvements for MapReduce support and
querying tools. We welcome feedback and contributions from the community to
the Kiji Project at www.kiji.org.

Regards,
- Aaron Kimball

Re: Announcing KijiSchema for HBase schema management

Posted by Lee Sheng <ls...@wibidata.com>.
Hi Asaf,

The row keys(EntityIds) in Kiji act as a translation layer between
some unique identifier(possibly, but not necessarily, derived from the
row data) and the HBase row keys.  There isn't yet support for
composite row keys at this time. Depending on what you're trying to
do, some of the KijiRowFilters may suffice for doing partial matches
of the data for scanners.

Lee

On Thu, Nov 15, 2012 at 11:19 AM, Asaf Mesika <as...@gmail.com> wrote:
> Thanks, that's great! Truly an awesome project.
> Is there a way to specify a composite row key composed of the fields
> specified in the table schema much like a definition of a primary key
> in oracle table?
> For example a rowkey can look like: (CustomerID)(StartTimeMs)(RequestId)
>
> Sent from my iPhone
>
> On 15 בנוב 2012, at 20:26, Aaron Kimball <ak...@gmail.com> wrote:
>
>> Hi Asaf,
>>
>> This is a good point. Our user guide is vague on the subject, but under the
>> hood, we are actually storing in each cell an integer id that is assigned
>> to the writer schema. KijiSchema maintains the id-to-schema mappings in a
>> metadata table (also stored in HBase) and looks them up as needed. I have
>> logged https://jira.kiji.org/browse/DOCS-2 to note this improvement
>>
>> Cheers,
>> - Aaron
>>
>>
>> On Wed, Nov 14, 2012 at 9:58 PM, Asaf Mesika <as...@gmail.com> wrote:
>>
>>> Hi,
>>> This looks great!
>>>
>>> I have a question regarding schema. It is written in the user guide that
>>> the schema of a cell is saved next to the data in the cell. I presume it
>>> would:
>>> Takes more spaces, as schema is duplicated for each row this cell is saved
>>> at
>>> Makes reading records slower since it needs to parse the Avro Schema
>>> before reading each cell
>>>
>>> Did I manage to understand the guide correctly?
>>>
>>> Thanks!
>>>
>>> Asaf
>>>
>>>
>>> On 15 בנוב 2012, at 00:18, Aaron Kimball <ak...@gmail.com> wrote:
>>>
>>>> HBase fans,
>>>>
>>>> I’m writing to announce the first release of KijiSchema, a new project to
>>>> help developers build applications on HBase. You can download it at
>>>> www.kiji.org. It is open source and published under the Apache 2
>>> license.
>>>>
>>>> KijiSchema simplifies the development of applications on HBase by
>>> providing
>>>> developer-friendly Java APIs for storing and managing typed data using
>>> Avro.
>>>>
>>>> As an application grows, developers can gracefully evolve the application
>>>> schema at the cell level to handle new fields. These features are
>>>> particularly well suited for entity-centric data schemas where all
>>>> information about a given entity, including dimensional and transaction
>>>> data, is encoded within the same row.
>>>>
>>>> Column names and associations of columns with schemas are maintained in a
>>>> data dictionary; developers don’t need to rely on reading source code to
>>>> remember where data is stored.
>>>>
>>>> Table schemas can be defined in JSON or by using KijiSchema’s declarative
>>>> DDL. Developers can also easily run MapReduce over Kiji tables in HBase
>>>> using included MR Input- and OutputFormats.
>>>>
>>>> KijiSchema is an open and highly modular system. It runs on top of an
>>>> existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
>>>> with no server-side daemons. KijiSchema can also be downloaded as part
>>> of a
>>>> Kiji BentoBox, which provides a clean install of a mini-cluster of
>>> Hadoop,
>>>> HBase and Kiji on your laptop in under 15 min. You do not need to have
>>>> Hadoop or HBase pre-installed to run the BentoBox.
>>>>
>>>> KijiSchema is inspired by work we have done at WibiData developing
>>>> applications for recommendations and personalization on top of HBase. We
>>>> will be developing and releasing other components into the Kiji project
>>> to
>>>> provide additional functionality enabling easy development of data
>>>> applications on HBase, including improvements for MapReduce support and
>>>> querying tools. We welcome feedback and contributions from the community
>>> to
>>>> the Kiji Project at www.kiji.org.
>>>>
>>>> Regards,
>>>> - Aaron Kimball
>>>
>>>

Re: Announcing KijiSchema for HBase schema management

Posted by Asaf Mesika <as...@gmail.com>.
Thanks, that's great! Truly an awesome project.
Is there a way to specify a composite row key composed of the fields
specified in the table schema much like a definition of a primary key
in oracle table?
For example a rowkey can look like: (CustomerID)(StartTimeMs)(RequestId)

Sent from my iPhone

On 15 בנוב 2012, at 20:26, Aaron Kimball <ak...@gmail.com> wrote:

> Hi Asaf,
>
> This is a good point. Our user guide is vague on the subject, but under the
> hood, we are actually storing in each cell an integer id that is assigned
> to the writer schema. KijiSchema maintains the id-to-schema mappings in a
> metadata table (also stored in HBase) and looks them up as needed. I have
> logged https://jira.kiji.org/browse/DOCS-2 to note this improvement
>
> Cheers,
> - Aaron
>
>
> On Wed, Nov 14, 2012 at 9:58 PM, Asaf Mesika <as...@gmail.com> wrote:
>
>> Hi,
>> This looks great!
>>
>> I have a question regarding schema. It is written in the user guide that
>> the schema of a cell is saved next to the data in the cell. I presume it
>> would:
>> Takes more spaces, as schema is duplicated for each row this cell is saved
>> at
>> Makes reading records slower since it needs to parse the Avro Schema
>> before reading each cell
>>
>> Did I manage to understand the guide correctly?
>>
>> Thanks!
>>
>> Asaf
>>
>>
>> On 15 בנוב 2012, at 00:18, Aaron Kimball <ak...@gmail.com> wrote:
>>
>>> HBase fans,
>>>
>>> I’m writing to announce the first release of KijiSchema, a new project to
>>> help developers build applications on HBase. You can download it at
>>> www.kiji.org. It is open source and published under the Apache 2
>> license.
>>>
>>> KijiSchema simplifies the development of applications on HBase by
>> providing
>>> developer-friendly Java APIs for storing and managing typed data using
>> Avro.
>>>
>>> As an application grows, developers can gracefully evolve the application
>>> schema at the cell level to handle new fields. These features are
>>> particularly well suited for entity-centric data schemas where all
>>> information about a given entity, including dimensional and transaction
>>> data, is encoded within the same row.
>>>
>>> Column names and associations of columns with schemas are maintained in a
>>> data dictionary; developers don’t need to rely on reading source code to
>>> remember where data is stored.
>>>
>>> Table schemas can be defined in JSON or by using KijiSchema’s declarative
>>> DDL. Developers can also easily run MapReduce over Kiji tables in HBase
>>> using included MR Input- and OutputFormats.
>>>
>>> KijiSchema is an open and highly modular system. It runs on top of an
>>> existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
>>> with no server-side daemons. KijiSchema can also be downloaded as part
>> of a
>>> Kiji BentoBox, which provides a clean install of a mini-cluster of
>> Hadoop,
>>> HBase and Kiji on your laptop in under 15 min. You do not need to have
>>> Hadoop or HBase pre-installed to run the BentoBox.
>>>
>>> KijiSchema is inspired by work we have done at WibiData developing
>>> applications for recommendations and personalization on top of HBase. We
>>> will be developing and releasing other components into the Kiji project
>> to
>>> provide additional functionality enabling easy development of data
>>> applications on HBase, including improvements for MapReduce support and
>>> querying tools. We welcome feedback and contributions from the community
>> to
>>> the Kiji Project at www.kiji.org.
>>>
>>> Regards,
>>> - Aaron Kimball
>>
>>

Re: Announcing KijiSchema for HBase schema management

Posted by Aaron Kimball <ak...@gmail.com>.
Hi Asaf,

This is a good point. Our user guide is vague on the subject, but under the
hood, we are actually storing in each cell an integer id that is assigned
to the writer schema. KijiSchema maintains the id-to-schema mappings in a
metadata table (also stored in HBase) and looks them up as needed. I have
logged https://jira.kiji.org/browse/DOCS-2 to note this improvement

Cheers,
- Aaron


On Wed, Nov 14, 2012 at 9:58 PM, Asaf Mesika <as...@gmail.com> wrote:

> Hi,
> This looks great!
>
> I have a question regarding schema. It is written in the user guide that
> the schema of a cell is saved next to the data in the cell. I presume it
> would:
> Takes more spaces, as schema is duplicated for each row this cell is saved
> at
> Makes reading records slower since it needs to parse the Avro Schema
> before reading each cell
>
> Did I manage to understand the guide correctly?
>
> Thanks!
>
> Asaf
>
>
> On 15 בנוב 2012, at 00:18, Aaron Kimball <ak...@gmail.com> wrote:
>
> > HBase fans,
> >
> > I’m writing to announce the first release of KijiSchema, a new project to
> > help developers build applications on HBase. You can download it at
> > www.kiji.org. It is open source and published under the Apache 2
> license.
> >
> > KijiSchema simplifies the development of applications on HBase by
> providing
> > developer-friendly Java APIs for storing and managing typed data using
> Avro.
> >
> > As an application grows, developers can gracefully evolve the application
> > schema at the cell level to handle new fields. These features are
> > particularly well suited for entity-centric data schemas where all
> > information about a given entity, including dimensional and transaction
> > data, is encoded within the same row.
> >
> > Column names and associations of columns with schemas are maintained in a
> > data dictionary; developers don’t need to rely on reading source code to
> > remember where data is stored.
> >
> > Table schemas can be defined in JSON or by using KijiSchema’s declarative
> > DDL. Developers can also easily run MapReduce over Kiji tables in HBase
> > using included MR Input- and OutputFormats.
> >
> > KijiSchema is an open and highly modular system. It runs on top of an
> > existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
> > with no server-side daemons. KijiSchema can also be downloaded as part
> of a
> > Kiji BentoBox, which provides a clean install of a mini-cluster of
> Hadoop,
> > HBase and Kiji on your laptop in under 15 min. You do not need to have
> > Hadoop or HBase pre-installed to run the BentoBox.
> >
> > KijiSchema is inspired by work we have done at WibiData developing
> > applications for recommendations and personalization on top of HBase. We
> > will be developing and releasing other components into the Kiji project
> to
> > provide additional functionality enabling easy development of data
> > applications on HBase, including improvements for MapReduce support and
> > querying tools. We welcome feedback and contributions from the community
> to
> > the Kiji Project at www.kiji.org.
> >
> > Regards,
> > - Aaron Kimball
>
>

Re: Announcing KijiSchema for HBase schema management

Posted by Asaf Mesika <as...@gmail.com>.
Hi,
This looks great!

I have a question regarding schema. It is written in the user guide that the schema of a cell is saved next to the data in the cell. I presume it would:
Takes more spaces, as schema is duplicated for each row this cell is saved at
Makes reading records slower since it needs to parse the Avro Schema before reading each cell 

Did I manage to understand the guide correctly?

Thanks!

Asaf


On 15 בנוב 2012, at 00:18, Aaron Kimball <ak...@gmail.com> wrote:

> HBase fans,
> 
> I’m writing to announce the first release of KijiSchema, a new project to
> help developers build applications on HBase. You can download it at
> www.kiji.org. It is open source and published under the Apache 2 license.
> 
> KijiSchema simplifies the development of applications on HBase by providing
> developer-friendly Java APIs for storing and managing typed data using Avro.
> 
> As an application grows, developers can gracefully evolve the application
> schema at the cell level to handle new fields. These features are
> particularly well suited for entity-centric data schemas where all
> information about a given entity, including dimensional and transaction
> data, is encoded within the same row.
> 
> Column names and associations of columns with schemas are maintained in a
> data dictionary; developers don’t need to rely on reading source code to
> remember where data is stored.
> 
> Table schemas can be defined in JSON or by using KijiSchema’s declarative
> DDL. Developers can also easily run MapReduce over Kiji tables in HBase
> using included MR Input- and OutputFormats.
> 
> KijiSchema is an open and highly modular system. It runs on top of an
> existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
> with no server-side daemons. KijiSchema can also be downloaded as part of a
> Kiji BentoBox, which provides a clean install of a mini-cluster of Hadoop,
> HBase and Kiji on your laptop in under 15 min. You do not need to have
> Hadoop or HBase pre-installed to run the BentoBox.
> 
> KijiSchema is inspired by work we have done at WibiData developing
> applications for recommendations and personalization on top of HBase. We
> will be developing and releasing other components into the Kiji project to
> provide additional functionality enabling easy development of data
> applications on HBase, including improvements for MapReduce support and
> querying tools. We welcome feedback and contributions from the community to
> the Kiji Project at www.kiji.org.
> 
> Regards,
> - Aaron Kimball


Re: Announcing KijiSchema for HBase schema management

Posted by Ted Yu <yu...@gmail.com>.
Aaron:
I found an interesting concept when I read the code:
locality group

Can you say a little more about this feature ?

Thanks

On Wed, Nov 14, 2012 at 2:18 PM, Aaron Kimball <ak...@gmail.com> wrote:

> HBase fans,
>
> I’m writing to announce the first release of KijiSchema, a new project to
> help developers build applications on HBase. You can download it at
> www.kiji.org. It is open source and published under the Apache 2 license.
>
> KijiSchema simplifies the development of applications on HBase by providing
> developer-friendly Java APIs for storing and managing typed data using
> Avro.
>
> As an application grows, developers can gracefully evolve the application
> schema at the cell level to handle new fields. These features are
> particularly well suited for entity-centric data schemas where all
> information about a given entity, including dimensional and transaction
> data, is encoded within the same row.
>
> Column names and associations of columns with schemas are maintained in a
> data dictionary; developers don’t need to rely on reading source code to
> remember where data is stored.
>
> Table schemas can be defined in JSON or by using KijiSchema’s declarative
> DDL. Developers can also easily run MapReduce over Kiji tables in HBase
> using included MR Input- and OutputFormats.
>
> KijiSchema is an open and highly modular system. It runs on top of an
> existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
> with no server-side daemons. KijiSchema can also be downloaded as part of a
> Kiji BentoBox, which provides a clean install of a mini-cluster of Hadoop,
> HBase and Kiji on your laptop in under 15 min. You do not need to have
> Hadoop or HBase pre-installed to run the BentoBox.
>
> KijiSchema is inspired by work we have done at WibiData developing
> applications for recommendations and personalization on top of HBase. We
> will be developing and releasing other components into the Kiji project to
> provide additional functionality enabling easy development of data
> applications on HBase, including improvements for MapReduce support and
> querying tools. We welcome feedback and contributions from the community to
> the Kiji Project at www.kiji.org.
>
> Regards,
> - Aaron Kimball
>

Re: Announcing KijiSchema for HBase schema management

Posted by jian fan <xi...@gmail.com>.
Jfan@mail.shu.edu.cn
在 2012-11-15 AM6:19,"Aaron Kimball" <ak...@gmail.com>写道:
>
> HBase fans,
>
> I’m writing to announce the first release of KijiSchema, a new project to
> help developers build applications on HBase. You can download it at
> www.kiji.org. It is open source and published under the Apache 2 license.
>
> KijiSchema simplifies the development of applications on HBase by
providing
> developer-friendly Java APIs for storing and managing typed data using
Avro.
>
> As an application grows, developers can gracefully evolve the application
> schema at the cell level to handle new fields. These features are
> particularly well suited for entity-centric data schemas where all
> information about a given entity, including dimensional and transaction
> data, is encoded within the same row.
>
> Column names and associations of columns with schemas are maintained in a
> data dictionary; developers don’t need to rely on reading source code to
> remember where data is stored.
>
> Table schemas can be defined in JSON or by using KijiSchema’s declarative
> DDL. Developers can also easily run MapReduce over Kiji tables in HBase
> using included MR Input- and OutputFormats.
>
> KijiSchema is an open and highly modular system. It runs on top of an
> existing HBase 0.92 (CDH4) cluster, and can be run entirely on the client
> with no server-side daemons. KijiSchema can also be downloaded as part of
a
> Kiji BentoBox, which provides a clean install of a mini-cluster of Hadoop,
> HBase and Kiji on your laptop in under 15 min. You do not need to have
> Hadoop or HBase pre-installed to run the BentoBox.
>
> KijiSchema is inspired by work we have done at WibiData developing
> applications for recommendations and personalization on top of HBase. We
> will be developing and releasing other components into the Kiji project to
> provide additional functionality enabling easy development of data
> applications on HBase, including improvements for MapReduce support and
> querying tools. We welcome feedback and contributions from the community
to
> the Kiji Project at www.kiji.org.
>
> Regards,
> - Aaron Kimball