You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by ccleve <cc...@gmail.com> on 2013/01/28 21:00:54 UTC

Is Avro/Trevni strictly read-only?

I've gone through the documentation, but haven't been able to get a 
definite answer: is Avro, or specifically Trevni, only for read-only data?

Is it possible to update or delete records?

If records can be deleted, is there any code that will merge row sets to 
get rid of the unused space?




Re: Is Avro/Trevni strictly read-only?

Posted by Aaron Kimball <ak...@gmail.com>.
interesting -- thanks for the link! Let me know if you have any more Kiji
questions.
Cheers
- Aaron


On Wed, Jan 30, 2013 at 6:49 PM, Russell Jurney <ru...@gmail.com>wrote:

> I'm looking at Panthera, I'll check out Kiji too. Inferring the schema
> from the first record and creating a table it what is done in Voldemort's
> build/push job, so I'll look into that.
>
>
> https://github.com/voldemort/voldemort/wiki/Build-and-Push-Jobs-for-Voldemort-Read-Only-Stores
>
> Russell Jurney http://datasyndrome.com
>
> On Jan 30, 2013, at 6:33 PM, Aaron Kimball <ak...@gmail.com> wrote:
>
> Hi Russell,
>
> Great question.  Kiji is more strongly typed than systems like MongoDB.
> While your schema can evolve (using Avro evolution) without structurally
> updating existing data, you still need to specify your Avro schemas in a
> data dictionary. It's challenging to author systems in Java (as is typical
> of HBase/HDFS/MapReduce-facing applications) without some strong typing in
> the persistence layer. You wind up reading a lot of other peoples' code to
> figure out what types were written -- assuming you can find the code (or
> the hbase columns) in the first place.
>
> You can create table schemas either "manually" by filling out a JSON /
> Avro-based table layout specification, or you can use the DDL shell which
> lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
> table's set up, then you can write to it.  I think the DDL shell included
> with the bento box makes this a reasonably low-overhead process.
>
> We don't currently have any Pig integration. We've made some initial
> proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
> but it's not in a ready state yet. Someone (you? :) could write a Pig
> integration; Pig already supports Avro I think. And you could even make it
> analyze the first output tuple and use that to infer types/column names to
> set up a result table with the appropriate table schema by invoking the DDL
> procedurally.
>
> Sorry I don't have a "magic wand" answer for you -- for the use cases we
> target, these sorts of setup costs often pay off in the long run, so that's
> the case we've optimized the design around. Let me know if there's anything
> else I can help with.
> Thanks,
> - Aaron
>
>
> On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <ru...@gmail.com>wrote:
>
>> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit
>> of not specifying schemas with Voldemort and MongoDB, just storing a Pig
>> relation and the schema is set in the store. If I can arrange that somehow,
>> I'm all over Kiji. Panthera is a fork :/
>>
>>
>> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <ak...@gmail.com>wrote:
>>
>>> Hi ccleve,
>>>
>>> I'd definitely urge you to try out Kiji -- we who work on it think it's
>>> a pretty good fit for this specific use case. If you've got further
>>> questions about Kiji and how to use it, please send them to me, or ask the
>>> kiji user mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>>>
>>> cheers,
>>> - Aaron
>>>
>>>
>>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cu...@apache.org>wrote:
>>>
>>>> Avro and Trevni files do not support record update or delete.
>>>>
>>>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>>>> to store Avro data in HBase.
>>>>
>>>> Doug
>>>>
>>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
>>>> > I've gone through the documentation, but haven't been able to get a
>>>> definite
>>>> > answer: is Avro, or specifically Trevni, only for read-only data?
>>>> >
>>>> > Is it possible to update or delete records?
>>>> >
>>>> > If records can be deleted, is there any code that will merge row sets
>>>> to get
>>>> > rid of the unused space?
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>>
>> --
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
>> com
>>
>
>

Re: Is Avro/Trevni strictly read-only?

Posted by Russell Jurney <ru...@gmail.com>.
I'm looking at Panthera, I'll check out Kiji too. Inferring the schema from
the first record and creating a table it what is done in Voldemort's
build/push job, so I'll look into that.

https://github.com/voldemort/voldemort/wiki/Build-and-Push-Jobs-for-Voldemort-Read-Only-Stores

Russell Jurney http://datasyndrome.com

On Jan 30, 2013, at 6:33 PM, Aaron Kimball <ak...@gmail.com> wrote:

Hi Russell,

Great question.  Kiji is more strongly typed than systems like MongoDB.
While your schema can evolve (using Avro evolution) without structurally
updating existing data, you still need to specify your Avro schemas in a
data dictionary. It's challenging to author systems in Java (as is typical
of HBase/HDFS/MapReduce-facing applications) without some strong typing in
the persistence layer. You wind up reading a lot of other peoples' code to
figure out what types were written -- assuming you can find the code (or
the hbase columns) in the first place.

You can create table schemas either "manually" by filling out a JSON /
Avro-based table layout specification, or you can use the DDL shell which
lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
table's set up, then you can write to it.  I think the DDL shell included
with the bento box makes this a reasonably low-overhead process.

We don't currently have any Pig integration. We've made some initial
proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
but it's not in a ready state yet. Someone (you? :) could write a Pig
integration; Pig already supports Avro I think. And you could even make it
analyze the first output tuple and use that to infer types/column names to
set up a result table with the appropriate table schema by invoking the DDL
procedurally.

Sorry I don't have a "magic wand" answer for you -- for the use cases we
target, these sorts of setup costs often pay off in the long run, so that's
the case we've optimized the design around. Let me know if there's anything
else I can help with.
Thanks,
- Aaron


On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <ru...@gmail.com>wrote:

> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit
> of not specifying schemas with Voldemort and MongoDB, just storing a Pig
> relation and the schema is set in the store. If I can arrange that somehow,
> I'm all over Kiji. Panthera is a fork :/
>
>
> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <ak...@gmail.com>wrote:
>
>> Hi ccleve,
>>
>> I'd definitely urge you to try out Kiji -- we who work on it think it's a
>> pretty good fit for this specific use case. If you've got further questions
>> about Kiji and how to use it, please send them to me, or ask the kiji user
>> mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>>
>> cheers,
>> - Aaron
>>
>>
>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cu...@apache.org> wrote:
>>
>>> Avro and Trevni files do not support record update or delete.
>>>
>>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>>> to store Avro data in HBase.
>>>
>>> Doug
>>>
>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
>>> > I've gone through the documentation, but haven't been able to get a
>>> definite
>>> > answer: is Avro, or specifically Trevni, only for read-only data?
>>> >
>>> > Is it possible to update or delete records?
>>> >
>>> > If records can be deleted, is there any code that will merge row sets
>>> to get
>>> > rid of the unused space?
>>> >
>>> >
>>> >
>>>
>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>

Re: Is Avro/Trevni strictly read-only?

Posted by Aaron Kimball <ak...@gmail.com>.
Hi Russell,

Great question.  Kiji is more strongly typed than systems like MongoDB.
While your schema can evolve (using Avro evolution) without structurally
updating existing data, you still need to specify your Avro schemas in a
data dictionary. It's challenging to author systems in Java (as is typical
of HBase/HDFS/MapReduce-facing applications) without some strong typing in
the persistence layer. You wind up reading a lot of other peoples' code to
figure out what types were written -- assuming you can find the code (or
the hbase columns) in the first place.

You can create table schemas either "manually" by filling out a JSON /
Avro-based table layout specification, or you can use the DDL shell which
lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
table's set up, then you can write to it.  I think the DDL shell included
with the bento box makes this a reasonably low-overhead process.

We don't currently have any Pig integration. We've made some initial
proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
but it's not in a ready state yet. Someone (you? :) could write a Pig
integration; Pig already supports Avro I think. And you could even make it
analyze the first output tuple and use that to infer types/column names to
set up a result table with the appropriate table schema by invoking the DDL
procedurally.

Sorry I don't have a "magic wand" answer for you -- for the use cases we
target, these sorts of setup costs often pay off in the long run, so that's
the case we've optimized the design around. Let me know if there's anything
else I can help with.
Thanks,
- Aaron


On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <ru...@gmail.com>wrote:

> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit
> of not specifying schemas with Voldemort and MongoDB, just storing a Pig
> relation and the schema is set in the store. If I can arrange that somehow,
> I'm all over Kiji. Panthera is a fork :/
>
>
> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <ak...@gmail.com>wrote:
>
>> Hi ccleve,
>>
>> I'd definitely urge you to try out Kiji -- we who work on it think it's a
>> pretty good fit for this specific use case. If you've got further questions
>> about Kiji and how to use it, please send them to me, or ask the kiji user
>> mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>>
>> cheers,
>> - Aaron
>>
>>
>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cu...@apache.org> wrote:
>>
>>> Avro and Trevni files do not support record update or delete.
>>>
>>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>>> to store Avro data in HBase.
>>>
>>> Doug
>>>
>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
>>> > I've gone through the documentation, but haven't been able to get a
>>> definite
>>> > answer: is Avro, or specifically Trevni, only for read-only data?
>>> >
>>> > Is it possible to update or delete records?
>>> >
>>> > If records can be deleted, is there any code that will merge row sets
>>> to get
>>> > rid of the unused space?
>>> >
>>> >
>>> >
>>>
>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>

Re: Is Avro/Trevni strictly read-only?

Posted by Russell Jurney <ru...@gmail.com>.
Aaron - is there a way to create a Kiji table from Pig? I'm in the habit of
not specifying schemas with Voldemort and MongoDB, just storing a Pig
relation and the schema is set in the store. If I can arrange that somehow,
I'm all over Kiji. Panthera is a fork :/


On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <ak...@gmail.com> wrote:

> Hi ccleve,
>
> I'd definitely urge you to try out Kiji -- we who work on it think it's a
> pretty good fit for this specific use case. If you've got further questions
> about Kiji and how to use it, please send them to me, or ask the kiji user
> mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>
> cheers,
> - Aaron
>
>
> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cu...@apache.org> wrote:
>
>> Avro and Trevni files do not support record update or delete.
>>
>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>> to store Avro data in HBase.
>>
>> Doug
>>
>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
>> > I've gone through the documentation, but haven't been able to get a
>> definite
>> > answer: is Avro, or specifically Trevni, only for read-only data?
>> >
>> > Is it possible to update or delete records?
>> >
>> > If records can be deleted, is there any code that will merge row sets
>> to get
>> > rid of the unused space?
>> >
>> >
>> >
>>
>
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Is Avro/Trevni strictly read-only?

Posted by Aaron Kimball <ak...@gmail.com>.
Hi ccleve,

I'd definitely urge you to try out Kiji -- we who work on it think it's a
pretty good fit for this specific use case. If you've got further questions
about Kiji and how to use it, please send them to me, or ask the kiji user
mailing list: http://www.kiji.org/getinvolved#Mailing_Lists

cheers,
- Aaron


On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cu...@apache.org> wrote:

> Avro and Trevni files do not support record update or delete.
>
> For large changing datasets you might use Kiji (http://www.kiji.org/)
> to store Avro data in HBase.
>
> Doug
>
> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
> > I've gone through the documentation, but haven't been able to get a
> definite
> > answer: is Avro, or specifically Trevni, only for read-only data?
> >
> > Is it possible to update or delete records?
> >
> > If records can be deleted, is there any code that will merge row sets to
> get
> > rid of the unused space?
> >
> >
> >
>

Re: Is Avro/Trevni strictly read-only?

Posted by Russell Jurney <ru...@gmail.com>.
Intel's HBase Panthera has an Avro document store builtin - another option:
https://github.com/intel-hadoop/hbase-0.94-panthera


On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <cu...@apache.org> wrote:

> Avro and Trevni files do not support record update or delete.
>
> For large changing datasets you might use Kiji (http://www.kiji.org/)
> to store Avro data in HBase.
>
> Doug
>
> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
> > I've gone through the documentation, but haven't been able to get a
> definite
> > answer: is Avro, or specifically Trevni, only for read-only data?
> >
> > Is it possible to update or delete records?
> >
> > If records can be deleted, is there any code that will merge row sets to
> get
> > rid of the unused space?
> >
> >
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Is Avro/Trevni strictly read-only?

Posted by Doug Cutting <cu...@apache.org>.
Avro and Trevni files do not support record update or delete.

For large changing datasets you might use Kiji (http://www.kiji.org/)
to store Avro data in HBase.

Doug

On Mon, Jan 28, 2013 at 12:00 PM, ccleve <cc...@gmail.com> wrote:
> I've gone through the documentation, but haven't been able to get a definite
> answer: is Avro, or specifically Trevni, only for read-only data?
>
> Is it possible to update or delete records?
>
> If records can be deleted, is there any code that will merge row sets to get
> rid of the unused space?
>
>
>