You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Mark <st...@gmail.com> on 2013/05/23 22:29:35 UTC

Is Avro right for me?

We're thinking about generating logs and events with Avro and shipping them to a central collector service via Flume. Is this a valid use case?


Re: Is Avro right for me?

Posted by Martin Kleppmann <ma...@rapportive.com>.
On 28 May 2013, at 23:38, Mark <st...@gmail.com> wrote:
> I actually looked into Kafka quite some time ago and I think we passed on it because it didn't have much ruby support (That may have changed by now).

Ruby support unfortunately continues to be weak. I actually wrote a brand new Kafka producer client for Ruby (not yet open source), but from Kafka 0.7 to 0.8 the wire protocol is changing (as replication & high availability features are added), which means I'm going to have to re-do it.

The Kafka project officially supports producers using the JVM or the C client (the latter could be embedded in Ruby using FFI), and consumers using the JVM client only.

However, that doesn't rule out other languages. LinkedIn actually operates a large service that consumes Kafka using the JVM client, writes the Avro messages to stdout as JSON, and then pipes that into a python process' stdin. Works surprisingly well!

Martin


Re: Is Avro right for me?

Posted by Felix GV <fe...@mate1inc.com>.
Also, if you end up choosing to use Kafka and persisting your messages into
Hadoop, then you should take a look at
Camus<https://github.com/linkedin/camus> (which
is also from LinkedIn).

If you do things the LinkedIn way right from the start (i.e.: using the
AVRO-1124 schema repo and encoding time in a standard way in a header
contained in all your schemas), then you can use Camus pretty much out of
the box without any tweaking, and the solution you'll get is very flexible
/ extensible (regarding the ability to evolve your schemas gracefully,
letting Camus discover new topics to persist automatically, etc).

For us it was a little more complicated since we had some legacy stuff that
was not exactly how Camus expected it, but it wasn't that complicated to
integrate either...

--
Felix


On Thu, Jun 6, 2013 at 2:51 PM, Felix GV <fe...@mate1inc.com> wrote:

> You can serialize avro messages into json or binary format, and then pass
> them around to anything else (send them over HTTP, publish them to a
> message broker system like Kafka or Flume, write them directly into a data
> store, etc.). You can forget about the avro RPC, as it's just one way
> amongst many of doing this.
>
> You do need to manage schemas properly though. The easy way is to hardcode
> your schema on both ends, but then that makes it harder to evolve schemas
> (which avro can do very well otherwise). If you send single serialized avro
> messages around through a message broker system, then you should definitely
> consider using a version number for your schema at the beginning of the
> message, as Martin suggested. Then you can look up what schema each version
> number represents with something like the versioned schema repo in
> AVRO-1124 <https://issues.apache.org/jira/browse/AVRO-1124>.
>
> --
> Felix
>
>
> On Tue, Jun 4, 2013 at 11:10 PM, Mark <st...@gmail.com> wrote:
>
>> I have a question.  Say I want to use AVRO as my serialization format to
>> speak between service applications. Do I need to use AVRO RPC for this or
>> can I just exchange AVRO messages over HTTP?
>>
>> Also whats the difference between an IPC client and an HTTP IPC client?
>> https://github.com/apache/avro/tree/trunk/lang/ruby/test
>>
>> Thanks
>>
>>
>> On May 29, 2013, at 8:02 PM, Mike Percy <mp...@apache.org> wrote:
>>
>> There is no Ruby support for the Netty Avro RPC protocol that I know of.
>> But I'm not sure why that matters, other than the fact that the Flume
>> Thrift support it's not in an official release yet.
>>
>> You could also take a look at the Flume HTTP source for a REST-based
>> interface, but to accept binary data instead of JSON (the default) you
>> would need to write a small bit of Java code and plug that in.
>>
>> Make sure you differentiate between using Avro as a data storage format
>> and as an RPC mechanism. They are two very different things and don't need
>> to be tied together. Today, the data storage aspect is more mature and has
>> much wider language support.
>>
>> Mike
>>
>>
>> On Wed, May 29, 2013 at 9:30 AM, Mark <st...@gmail.com> wrote:
>>
>>> So basically Avro RPC is out of the question? Instead I would need to
>>> Avro Message -> Thrift -> Flume? Is that along the right lines or am I
>>> missing something?
>>>
>>>
>>> On May 28, 2013, at 5:02 PM, Mike Percy <mp...@apache.org> wrote:
>>>
>>> Regarding Ruby support, we recently added support for Thrift RPC, so you
>>> can now send messages to Flume via Ruby and other non-JVM languages. We
>>> don't have out-of-the-box client APIs for those yet but would be happy to
>>> accept patches for it :)
>>>
>>>
>>>
>>
>>
>

Re: Is Avro right for me?

Posted by Felix GV <fe...@mate1inc.com>.
You can serialize avro messages into json or binary format, and then pass
them around to anything else (send them over HTTP, publish them to a
message broker system like Kafka or Flume, write them directly into a data
store, etc.). You can forget about the avro RPC, as it's just one way
amongst many of doing this.

You do need to manage schemas properly though. The easy way is to hardcode
your schema on both ends, but then that makes it harder to evolve schemas
(which avro can do very well otherwise). If you send single serialized avro
messages around through a message broker system, then you should definitely
consider using a version number for your schema at the beginning of the
message, as Martin suggested. Then you can look up what schema each version
number represents with something like the versioned schema repo in
AVRO-1124<https://issues.apache.org/jira/browse/AVRO-1124>
.

--
Felix


On Tue, Jun 4, 2013 at 11:10 PM, Mark <st...@gmail.com> wrote:

> I have a question.  Say I want to use AVRO as my serialization format to
> speak between service applications. Do I need to use AVRO RPC for this or
> can I just exchange AVRO messages over HTTP?
>
> Also whats the difference between an IPC client and an HTTP IPC client?
> https://github.com/apache/avro/tree/trunk/lang/ruby/test
>
> Thanks
>
>
> On May 29, 2013, at 8:02 PM, Mike Percy <mp...@apache.org> wrote:
>
> There is no Ruby support for the Netty Avro RPC protocol that I know of.
> But I'm not sure why that matters, other than the fact that the Flume
> Thrift support it's not in an official release yet.
>
> You could also take a look at the Flume HTTP source for a REST-based
> interface, but to accept binary data instead of JSON (the default) you
> would need to write a small bit of Java code and plug that in.
>
> Make sure you differentiate between using Avro as a data storage format
> and as an RPC mechanism. They are two very different things and don't need
> to be tied together. Today, the data storage aspect is more mature and has
> much wider language support.
>
> Mike
>
>
> On Wed, May 29, 2013 at 9:30 AM, Mark <st...@gmail.com> wrote:
>
>> So basically Avro RPC is out of the question? Instead I would need to
>> Avro Message -> Thrift -> Flume? Is that along the right lines or am I
>> missing something?
>>
>>
>> On May 28, 2013, at 5:02 PM, Mike Percy <mp...@apache.org> wrote:
>>
>> Regarding Ruby support, we recently added support for Thrift RPC, so you
>> can now send messages to Flume via Ruby and other non-JVM languages. We
>> don't have out-of-the-box client APIs for those yet but would be happy to
>> accept patches for it :)
>>
>>
>>
>
>

Re: Is Avro right for me?

Posted by Mark <st...@gmail.com>.
I have a question.  Say I want to use AVRO as my serialization format to speak between service applications. Do I need to use AVRO RPC for this or can I just exchange AVRO messages over HTTP?

Also whats the difference between an IPC client and an HTTP IPC client? https://github.com/apache/avro/tree/trunk/lang/ruby/test

Thanks

On May 29, 2013, at 8:02 PM, Mike Percy <mp...@apache.org> wrote:

> There is no Ruby support for the Netty Avro RPC protocol that I know of. But I'm not sure why that matters, other than the fact that the Flume Thrift support it's not in an official release yet.
> 
> You could also take a look at the Flume HTTP source for a REST-based interface, but to accept binary data instead of JSON (the default) you would need to write a small bit of Java code and plug that in.
> 
> Make sure you differentiate between using Avro as a data storage format and as an RPC mechanism. They are two very different things and don't need to be tied together. Today, the data storage aspect is more mature and has much wider language support.
> 
> Mike
> 
> 
> On Wed, May 29, 2013 at 9:30 AM, Mark <st...@gmail.com> wrote:
> So basically Avro RPC is out of the question? Instead I would need to Avro Message -> Thrift -> Flume? Is that along the right lines or am I missing something?
> 
> 
> On May 28, 2013, at 5:02 PM, Mike Percy <mp...@apache.org> wrote:
> 
>> Regarding Ruby support, we recently added support for Thrift RPC, so you can now send messages to Flume via Ruby and other non-JVM languages. We don't have out-of-the-box client APIs for those yet but would be happy to accept patches for it :)
> 
> 


Re: Is Avro right for me?

Posted by Mark <st...@gmail.com>.
> Make sure you differentiate between using Avro as a data storage format and as an RPC mechanism. They are two very different things and don't need to be tied together. Today, the data storage aspect is more mature and has much wider language support.


I think thats my problem. I'm trying to use it all or nothing. (Serialization and RPC)

On May 29, 2013, at 8:02 PM, Mike Percy <mp...@apache.org> wrote:

> There is no Ruby support for the Netty Avro RPC protocol that I know of. But I'm not sure why that matters, other than the fact that the Flume Thrift support it's not in an official release yet.
> 
> You could also take a look at the Flume HTTP source for a REST-based interface, but to accept binary data instead of JSON (the default) you would need to write a small bit of Java code and plug that in.
> 
> Make sure you differentiate between using Avro as a data storage format and as an RPC mechanism. They are two very different things and don't need to be tied together. Today, the data storage aspect is more mature and has much wider language support.
> 
> Mike
> 
> 
> On Wed, May 29, 2013 at 9:30 AM, Mark <st...@gmail.com> wrote:
> So basically Avro RPC is out of the question? Instead I would need to Avro Message -> Thrift -> Flume? Is that along the right lines or am I missing something?
> 
> 
> On May 28, 2013, at 5:02 PM, Mike Percy <mp...@apache.org> wrote:
> 
>> Regarding Ruby support, we recently added support for Thrift RPC, so you can now send messages to Flume via Ruby and other non-JVM languages. We don't have out-of-the-box client APIs for those yet but would be happy to accept patches for it :)
> 
> 


Re: Is Avro right for me?

Posted by Mike Percy <mp...@apache.org>.
There is no Ruby support for the Netty Avro RPC protocol that I know of.
But I'm not sure why that matters, other than the fact that the Flume
Thrift support it's not in an official release yet.

You could also take a look at the Flume HTTP source for a REST-based
interface, but to accept binary data instead of JSON (the default) you
would need to write a small bit of Java code and plug that in.

Make sure you differentiate between using Avro as a data storage format and
as an RPC mechanism. They are two very different things and don't need to
be tied together. Today, the data storage aspect is more mature and has
much wider language support.

Mike


On Wed, May 29, 2013 at 9:30 AM, Mark <st...@gmail.com> wrote:

> So basically Avro RPC is out of the question? Instead I would need to Avro
> Message -> Thrift -> Flume? Is that along the right lines or am I missing
> something?
>
>
> On May 28, 2013, at 5:02 PM, Mike Percy <mp...@apache.org> wrote:
>
> Regarding Ruby support, we recently added support for Thrift RPC, so you
> can now send messages to Flume via Ruby and other non-JVM languages. We
> don't have out-of-the-box client APIs for those yet but would be happy to
> accept patches for it :)
>
>
>

Re: Is Avro right for me?

Posted by Mark <st...@gmail.com>.
So basically Avro RPC is out of the question? Instead I would need to Avro Message -> Thrift -> Flume? Is that along the right lines or am I missing something?

On May 28, 2013, at 5:02 PM, Mike Percy <mp...@apache.org> wrote:

> Regarding Ruby support, we recently added support for Thrift RPC, so you can now send messages to Flume via Ruby and other non-JVM languages. We don't have out-of-the-box client APIs for those yet but would be happy to accept patches for it :)


Re: Is Avro right for me?

Posted by Mike Percy <mp...@apache.org>.
Flume is actually working on what you might call "first class" Avro support
right now, but today you can use it and there are people doing so in
production with success.

First of all, I assume that you want to store binary-encoded avro in each
event. As mentioned previously in this thread, this implies that the schema
needs to come from somewhere. Right now, with the released version of Flume
(1.3.1) you would want to write your own EventSerializer <
http://flume.apache.org/FlumeUserGuide.html#event-serializers> for each
schema you need to write to HDFS. There is a base class <
http://flume.apache.org/releases/content/1.3.1/apidocs/org/apache/flume/serialization/AbstractAvroEventSerializer.html>
you
can subclass that makes it easier to serialize Avro at that level.

There is a bunch of new development underway to make this a lot easier to
deal with.

1. Something to parse Avro container files and send them to Flume:
https://issues.apache.org/jira/browse/FLUME-2048
2. A generic event serializer that keys off a hash in the event header to
determine the schema: https://issues.apache.org/jira/browse/FLUME-2010

Regarding Ruby support, we recently added support for Thrift RPC, so you
can now send messages to Flume via Ruby and other non-JVM languages. We
don't have out-of-the-box client APIs for those yet but would be happy to
accept patches for it :)

Feel free to reach out to dev@flume.apache.org or user@flume.apache.org if
you'd like more information or want to help get these features finalized
sooner!

Mike



On Tue, May 28, 2013 at 3:38 PM, Mark <st...@gmail.com> wrote:

> Thanks for all of the information.
>
> I actually looked into Kafka quite some time ago and I think we passed on
> it because it didn't have much ruby support (That may have changed by now).
>
>
> On May 27, 2013, at 12:34 PM, Martin Kleppmann <ma...@rapportive.com>
> wrote:
>
> On 27 May 2013 20:00, Stefan Krawczyk <st...@nextdoor.com> wrote:
>
>> So it's up to you what you stick into the body of that Avro event. It
>> could just be json, or it could be your own serialized Avro event - and as
>> far as I understand serialized Avro always has the schema with it (right?).
>>
>
> In an Avro data file, yes, because you just need to specify the schema
> once, followed by (say) a million records that all use the same schema. And
> in an RPC context, you can negotiate the schema once per connection. But
> when using a message broker, you're serializing individual records and
> don't have an end-to-end connection with the consumer, so you'd need to
> include the schema with every single message.
>
> It probably doesn't make sense to include the full schema with every one,
> as a typical schema might be 2 kB whereas a serialized record might be less
> than 100 bytes (numbers obviously vary wildly by application), so the
> schema size would dominate. Hence my suggestion of including a schema
> version number or hash with every message.
>
> Be aware that Flume doesn't have great support for languages outside of
>> the JVM.
>>
>
> The same caveat unfortunately applies with Kafka too. There are clients
> for non-JVM languages, but they lack important features, so I would
> recommend using the official JVM client (if your application is non-JVM you
> could simply pipe your application's stdout into the Kafka producer, or
> vice versa on the consumer side).
>
> Martin
>
>
>

Re: Is Avro right for me?

Posted by Mark <st...@gmail.com>.
Thanks for all of the information.

I actually looked into Kafka quite some time ago and I think we passed on it because it didn't have much ruby support (That may have changed by now).


On May 27, 2013, at 12:34 PM, Martin Kleppmann <ma...@rapportive.com> wrote:

> On 27 May 2013 20:00, Stefan Krawczyk <st...@nextdoor.com> wrote:
> So it's up to you what you stick into the body of that Avro event. It could just be json, or it could be your own serialized Avro event - and as far as I understand serialized Avro always has the schema with it (right?).
> 
> In an Avro data file, yes, because you just need to specify the schema once, followed by (say) a million records that all use the same schema. And in an RPC context, you can negotiate the schema once per connection. But when using a message broker, you're serializing individual records and don't have an end-to-end connection with the consumer, so you'd need to include the schema with every single message.
> 
> It probably doesn't make sense to include the full schema with every one, as a typical schema might be 2 kB whereas a serialized record might be less than 100 bytes (numbers obviously vary wildly by application), so the schema size would dominate. Hence my suggestion of including a schema version number or hash with every message.
> 
> Be aware that Flume doesn't have great support for languages outside of the JVM.
> 
> The same caveat unfortunately applies with Kafka too. There are clients for non-JVM languages, but they lack important features, so I would recommend using the official JVM client (if your application is non-JVM you could simply pipe your application's stdout into the Kafka producer, or vice versa on the consumer side).
> 
> Martin
> 


Re: Is Avro right for me?

Posted by Martin Kleppmann <ma...@rapportive.com>.
On 27 May 2013 20:00, Stefan Krawczyk <st...@nextdoor.com> wrote:

> So it's up to you what you stick into the body of that Avro event. It
> could just be json, or it could be your own serialized Avro event - and as
> far as I understand serialized Avro always has the schema with it (right?).
>

In an Avro data file, yes, because you just need to specify the schema
once, followed by (say) a million records that all use the same schema. And
in an RPC context, you can negotiate the schema once per connection. But
when using a message broker, you're serializing individual records and
don't have an end-to-end connection with the consumer, so you'd need to
include the schema with every single message.

It probably doesn't make sense to include the full schema with every one,
as a typical schema might be 2 kB whereas a serialized record might be less
than 100 bytes (numbers obviously vary wildly by application), so the
schema size would dominate. Hence my suggestion of including a schema
version number or hash with every message.

Be aware that Flume doesn't have great support for languages outside of the
> JVM.
>

The same caveat unfortunately applies with Kafka too. There are clients for
non-JVM languages, but they lack important features, so I would recommend
using the official JVM client (if your application is non-JVM you could
simply pipe your application's stdout into the Kafka producer, or vice
versa on the consumer side).

Martin

Re: Is Avro right for me?

Posted by Stefan Krawczyk <st...@nextdoor.com>.
Mark:
FWIW: I would go with Kafka if you can, it's far more flexible; we aren't
using it until it authenticates producers and consumers and provides a way
to encrypt transport  - we run in the cloud...

Anyway, so we're using Flume. For Flume, with the current out of the box
implementation, they encapsulate data in an Avro event themselves.

So it's up to you what you stick into the body of that Avro event. It could
just be json, or it could be your own serialized Avro event - and as far as
I understand serialized Avro always has the schema with it (right?).

Be aware that Flume doesn't have great support for languages outside of the
JVM. Flume's Avro source that you can communicate with via Avro RPC uses
NettyServer/NettyTransceiver underneath, and as far as I know, there's been
no updates to other Avro RPC libraries e.g. Python, Ruby that enable
talking to such an Avro RPC endpoint. So you either have to build a client
that speaks that, or create your own source.

Cheers,

Stefan







On Mon, May 27, 2013 at 11:08 AM, Russell Jurney
<ru...@gmail.com>wrote:

> Whats more, there are examples and support for Kafka, but not so much for
> Flume.
>
>
> On Mon, May 27, 2013 at 6:25 AM, Martin Kleppmann <ma...@rapportive.com>wrote:
>
>> I don't have experience with Flume, so I can't comment on that. At
>> LinkedIn we ship logs around by sending Avro-encoded messages to Kafka (
>> http://kafka.apache.org/). Kafka is nice, it scales very well and gives
>> a great deal of flexibility — logs can be consumed by any number of
>> independent consumers, consumers can catch up on a backlog if they're
>> disconnected for a while, and it comes with Hadoop import out of the box.
>>
>> (RabbitMQ is more designed or use cases where each message corresponds to
>> a task that needs to be performed by a worker. IMHO Kafka is a better fit
>> for logs, which are more stream-like.)
>>
>> With any message broker, you'll need to somehow tag each message with the
>> schema that was used to encode it. You could include the full schema with
>> every message, but unless you have very large messages, that would be a
>> huge overhead. Better to give each version of your schema a sequential
>> version number, or hash the schema, and include the version number/hash in
>> each message. You can then keep a repository of schemas for resolving those
>> version numbers or hashes – simply in files that you distribute to all
>> producers/consumers, or in a simple REST service like
>> https://issues.apache.org/jira/browse/AVRO-1124
>>
>> Hope that helps,
>> Martin
>>
>>
>> On 26 May 2013 17:39, Mark <st...@gmail.com> wrote:
>>
>>> Yes our central server would be Hadoop.
>>>
>>> Exactly how would this work with flume? Would I write Avro to a file
>>> source which flume would then ship over to one of our collectors  or is
>>> there a better/native way? Would I have to include the schema in each
>>> event? FYI we would be doing this primarily from a rails application.
>>>
>>> Does anyone ever use Avro with a message bus like RabbitMQ?
>>>
>>> On May 23, 2013, at 9:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>> Yep. Avro would be great at that (provided your central consumer is Avro
>>> friendly, like a Hadoop system).  Make sure that all of your schemas have
>>> default values defined for fields so that schema evolution will be easier
>>> in the future.
>>>
>>>
>>> On Thu, May 23, 2013 at 4:29 PM, Mark <st...@gmail.com> wrote:
>>>
>>>> We're thinking about generating logs and events with Avro and shipping
>>>> them to a central collector service via Flume. Is this a valid use case?
>>>>
>>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>>
>>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>

Re: Is Avro right for me?

Posted by Russell Jurney <ru...@gmail.com>.
Whats more, there are examples and support for Kafka, but not so much for
Flume.


On Mon, May 27, 2013 at 6:25 AM, Martin Kleppmann <ma...@rapportive.com>wrote:

> I don't have experience with Flume, so I can't comment on that. At
> LinkedIn we ship logs around by sending Avro-encoded messages to Kafka (
> http://kafka.apache.org/). Kafka is nice, it scales very well and gives a
> great deal of flexibility — logs can be consumed by any number of
> independent consumers, consumers can catch up on a backlog if they're
> disconnected for a while, and it comes with Hadoop import out of the box.
>
> (RabbitMQ is more designed or use cases where each message corresponds to
> a task that needs to be performed by a worker. IMHO Kafka is a better fit
> for logs, which are more stream-like.)
>
> With any message broker, you'll need to somehow tag each message with the
> schema that was used to encode it. You could include the full schema with
> every message, but unless you have very large messages, that would be a
> huge overhead. Better to give each version of your schema a sequential
> version number, or hash the schema, and include the version number/hash in
> each message. You can then keep a repository of schemas for resolving those
> version numbers or hashes – simply in files that you distribute to all
> producers/consumers, or in a simple REST service like
> https://issues.apache.org/jira/browse/AVRO-1124
>
> Hope that helps,
> Martin
>
>
> On 26 May 2013 17:39, Mark <st...@gmail.com> wrote:
>
>> Yes our central server would be Hadoop.
>>
>> Exactly how would this work with flume? Would I write Avro to a file
>> source which flume would then ship over to one of our collectors  or is
>> there a better/native way? Would I have to include the schema in each
>> event? FYI we would be doing this primarily from a rails application.
>>
>> Does anyone ever use Avro with a message bus like RabbitMQ?
>>
>> On May 23, 2013, at 9:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>> Yep. Avro would be great at that (provided your central consumer is Avro
>> friendly, like a Hadoop system).  Make sure that all of your schemas have
>> default values defined for fields so that schema evolution will be easier
>> in the future.
>>
>>
>> On Thu, May 23, 2013 at 4:29 PM, Mark <st...@gmail.com> wrote:
>>
>>> We're thinking about generating logs and events with Avro and shipping
>>> them to a central collector service via Flume. Is this a valid use case?
>>>
>>>
>>
>>
>> --
>> Sean
>>
>>
>>
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Is Avro right for me?

Posted by Martin Kleppmann <ma...@rapportive.com>.
I don't have experience with Flume, so I can't comment on that. At LinkedIn
we ship logs around by sending Avro-encoded messages to Kafka (
http://kafka.apache.org/). Kafka is nice, it scales very well and gives a
great deal of flexibility — logs can be consumed by any number of
independent consumers, consumers can catch up on a backlog if they're
disconnected for a while, and it comes with Hadoop import out of the box.

(RabbitMQ is more designed or use cases where each message corresponds to a
task that needs to be performed by a worker. IMHO Kafka is a better fit for
logs, which are more stream-like.)

With any message broker, you'll need to somehow tag each message with the
schema that was used to encode it. You could include the full schema with
every message, but unless you have very large messages, that would be a
huge overhead. Better to give each version of your schema a sequential
version number, or hash the schema, and include the version number/hash in
each message. You can then keep a repository of schemas for resolving those
version numbers or hashes – simply in files that you distribute to all
producers/consumers, or in a simple REST service like
https://issues.apache.org/jira/browse/AVRO-1124

Hope that helps,
Martin


On 26 May 2013 17:39, Mark <st...@gmail.com> wrote:

> Yes our central server would be Hadoop.
>
> Exactly how would this work with flume? Would I write Avro to a file
> source which flume would then ship over to one of our collectors  or is
> there a better/native way? Would I have to include the schema in each
> event? FYI we would be doing this primarily from a rails application.
>
> Does anyone ever use Avro with a message bus like RabbitMQ?
>
> On May 23, 2013, at 9:16 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> Yep. Avro would be great at that (provided your central consumer is Avro
> friendly, like a Hadoop system).  Make sure that all of your schemas have
> default values defined for fields so that schema evolution will be easier
> in the future.
>
>
> On Thu, May 23, 2013 at 4:29 PM, Mark <st...@gmail.com> wrote:
>
>> We're thinking about generating logs and events with Avro and shipping
>> them to a central collector service via Flume. Is this a valid use case?
>>
>>
>
>
> --
> Sean
>
>
>

Re: Is Avro right for me?

Posted by Mark <st...@gmail.com>.
Yes our central server would be Hadoop. 

Exactly how would this work with flume? Would I write Avro to a file source which flume would then ship over to one of our collectors  or is there a better/native way? Would I have to include the schema in each event? FYI we would be doing this primarily from a rails application.

Does anyone ever use Avro with a message bus like RabbitMQ? 

On May 23, 2013, at 9:16 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Yep. Avro would be great at that (provided your central consumer is Avro friendly, like a Hadoop system).  Make sure that all of your schemas have default values defined for fields so that schema evolution will be easier in the future.
> 
> 
> On Thu, May 23, 2013 at 4:29 PM, Mark <st...@gmail.com> wrote:
> We're thinking about generating logs and events with Avro and shipping them to a central collector service via Flume. Is this a valid use case?
> 
> 
> 
> 
> -- 
> Sean


Re: Is Avro right for me?

Posted by Sean Busbey <bu...@cloudera.com>.
Yep. Avro would be great at that (provided your central consumer is Avro
friendly, like a Hadoop system).  Make sure that all of your schemas have
default values defined for fields so that schema evolution will be easier
in the future.


On Thu, May 23, 2013 at 4:29 PM, Mark <st...@gmail.com> wrote:

> We're thinking about generating logs and events with Avro and shipping
> them to a central collector service via Flume. Is this a valid use case?
>
>


-- 
Sean