You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Robert Nicholson <ro...@gmail.com> on 2012/08/19 18:46:37 UTC

Can Hadoop replace the use of MQ b/w processes?

We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.

The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Karthik Kambatla <ka...@cloudera.com>.

Hi Robert

To add to Russell's answer:

If real-time processing of events is required, you might want to use a
stream-processing system like Apache S4 or Twitter's Storm.

Karthik

On Sun, Aug 19, 2012 at 10:27 AM, Russell Jurney
<ru...@gmail.com>wrote:

> The model with Hadoop would be to aggregate and write your events to
> The Hadoop Distributed FileSystem, and then process them with
> scheduled batch jobs via Hadoop MapReduce. If your requirements can
> include some latency - then Hadoop can work for you. Depending on your
> processing, you can schedule jobs down to say... every hour, half hour
> or fifteen minutes? I'm not aware or anyone scheduling jobs more
> frequently than that, but they may be. Chime in if you are.
>
> For getting events to HDFS, look at Flume, Kafka and Scribe. For
> processing events, look at Pig, HIVE and Cascading. For scheduling
> jobs look at Oozie and Azkaban.
>
> Russell Jurney http://datasyndrome.com
>
> On Aug 19, 2012, at 9:47 AM, Robert Nicholson
> <ro...@gmail.com> wrote:
>
> > We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >
> > The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >
> >
> >
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Karthik Kambatla <ka...@cloudera.com>.

Hi Robert

To add to Russell's answer:

If real-time processing of events is required, you might want to use a
stream-processing system like Apache S4 or Twitter's Storm.

Karthik

On Sun, Aug 19, 2012 at 10:27 AM, Russell Jurney
<ru...@gmail.com>wrote:

> The model with Hadoop would be to aggregate and write your events to
> The Hadoop Distributed FileSystem, and then process them with
> scheduled batch jobs via Hadoop MapReduce. If your requirements can
> include some latency - then Hadoop can work for you. Depending on your
> processing, you can schedule jobs down to say... every hour, half hour
> or fifteen minutes? I'm not aware or anyone scheduling jobs more
> frequently than that, but they may be. Chime in if you are.
>
> For getting events to HDFS, look at Flume, Kafka and Scribe. For
> processing events, look at Pig, HIVE and Cascading. For scheduling
> jobs look at Oozie and Azkaban.
>
> Russell Jurney http://datasyndrome.com
>
> On Aug 19, 2012, at 9:47 AM, Robert Nicholson
> <ro...@gmail.com> wrote:
>
> > We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >
> > The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >
> >
> >
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Karthik Kambatla <ka...@cloudera.com>.

Hi Robert

To add to Russell's answer:

If real-time processing of events is required, you might want to use a
stream-processing system like Apache S4 or Twitter's Storm.

Karthik

On Sun, Aug 19, 2012 at 10:27 AM, Russell Jurney
<ru...@gmail.com>wrote:

> The model with Hadoop would be to aggregate and write your events to
> The Hadoop Distributed FileSystem, and then process them with
> scheduled batch jobs via Hadoop MapReduce. If your requirements can
> include some latency - then Hadoop can work for you. Depending on your
> processing, you can schedule jobs down to say... every hour, half hour
> or fifteen minutes? I'm not aware or anyone scheduling jobs more
> frequently than that, but they may be. Chime in if you are.
>
> For getting events to HDFS, look at Flume, Kafka and Scribe. For
> processing events, look at Pig, HIVE and Cascading. For scheduling
> jobs look at Oozie and Azkaban.
>
> Russell Jurney http://datasyndrome.com
>
> On Aug 19, 2012, at 9:47 AM, Robert Nicholson
> <ro...@gmail.com> wrote:
>
> > We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >
> > The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >
> >
> >
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Karthik Kambatla <ka...@cloudera.com>.

Hi Robert

To add to Russell's answer:

If real-time processing of events is required, you might want to use a
stream-processing system like Apache S4 or Twitter's Storm.

Karthik

On Sun, Aug 19, 2012 at 10:27 AM, Russell Jurney
<ru...@gmail.com>wrote:

> The model with Hadoop would be to aggregate and write your events to
> The Hadoop Distributed FileSystem, and then process them with
> scheduled batch jobs via Hadoop MapReduce. If your requirements can
> include some latency - then Hadoop can work for you. Depending on your
> processing, you can schedule jobs down to say... every hour, half hour
> or fifteen minutes? I'm not aware or anyone scheduling jobs more
> frequently than that, but they may be. Chime in if you are.
>
> For getting events to HDFS, look at Flume, Kafka and Scribe. For
> processing events, look at Pig, HIVE and Cascading. For scheduling
> jobs look at Oozie and Azkaban.
>
> Russell Jurney http://datasyndrome.com
>
> On Aug 19, 2012, at 9:47 AM, Robert Nicholson
> <ro...@gmail.com> wrote:
>
> > We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >
> > The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >
> >
> >
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

The model with Hadoop would be to aggregate and write your events to
The Hadoop Distributed FileSystem, and then process them with
scheduled batch jobs via Hadoop MapReduce. If your requirements can
include some latency - then Hadoop can work for you. Depending on your
processing, you can schedule jobs down to say... every hour, half hour
or fifteen minutes? I'm not aware or anyone scheduling jobs more
frequently than that, but they may be. Chime in if you are.

For getting events to HDFS, look at Flume, Kafka and Scribe. For
processing events, look at Pig, HIVE and Cascading. For scheduling
jobs look at Oozie and Azkaban.

Russell Jurney http://datasyndrome.com

On Aug 19, 2012, at 9:47 AM, Robert Nicholson
<ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>
>
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

The model with Hadoop would be to aggregate and write your events to
The Hadoop Distributed FileSystem, and then process them with
scheduled batch jobs via Hadoop MapReduce. If your requirements can
include some latency - then Hadoop can work for you. Depending on your
processing, you can schedule jobs down to say... every hour, half hour
or fifteen minutes? I'm not aware or anyone scheduling jobs more
frequently than that, but they may be. Chime in if you are.

For getting events to HDFS, look at Flume, Kafka and Scribe. For
processing events, look at Pig, HIVE and Cascading. For scheduling
jobs look at Oozie and Azkaban.

Russell Jurney http://datasyndrome.com

On Aug 19, 2012, at 9:47 AM, Robert Nicholson
<ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>
>
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Ted Dunning <td...@maprtech.com>.

There is another much more active fork of Azkaban.  See

https://github.com/rbpark/azkaban



On Sun, Aug 19, 2012 at 6:57 PM, Lance Norskog <go...@gmail.com> wrote:

> Cool. I'm on the sidelines of a project trying to use Oozie in a large
> Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.
>
> On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
> <ru...@gmail.com> wrote:
> > Glad to hear about Hamake. FWIW, I've had good success with Azkaban in
> the
> > past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even
> has a
> > DAG GUI.
> >
> >
> > On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >>
> >> Last checkin on Azkaban was 11 months ago:
> >>
> >>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
> >>
> >> But, the last checkin for Hamake was June 2010. And it's still a cool
> >> little Hadoop/Pig scheduler.
> >> http://hamake.googlecode.com/
> >>
> >> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> >> <mi...@hotmail.com> wrote:
> >> > There has been some work to replace the use of queues with HBase.
> >> > This would be used to feed processes off the queue to help balance out
> >> > the load on the cluster.
> >> >
> >> > In one specific use case, this was effective because the time spent
> >> > processing each mapper.map() iteration is a couple of orders of
> magnitude as
> >> > the time it takes to pull the data from the 'queue' and to each node
> for
> >> > processing.
> >> >
> >> > Again, YMMV, it is an interesting hack though....
> >> >
> >> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
> >> > <ro...@gmail.com> wrote:
> >> >
> >> >> We have an application or a series of applications that listen to
> >> >> incoming feeds they then distribute this data in XML form to a
> number of
> >> >> queues.  Another set of processes listen to these queues and process
> the
> >> >> messages. Order of processing is important in so far as related
> messages
> >> >> need to be processed in sequence hence today all related messages go
> to the
> >> >> same queue and are processed by the same queue consumer.
> >> >>
> >> >> The idea would be replace the use of MQ with some kind of reliable
> >> >> distributed dispatch. Does Hadoop provide that?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Ted Dunning <td...@maprtech.com>.

There is another much more active fork of Azkaban.  See

https://github.com/rbpark/azkaban



On Sun, Aug 19, 2012 at 6:57 PM, Lance Norskog <go...@gmail.com> wrote:

> Cool. I'm on the sidelines of a project trying to use Oozie in a large
> Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.
>
> On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
> <ru...@gmail.com> wrote:
> > Glad to hear about Hamake. FWIW, I've had good success with Azkaban in
> the
> > past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even
> has a
> > DAG GUI.
> >
> >
> > On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >>
> >> Last checkin on Azkaban was 11 months ago:
> >>
> >>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
> >>
> >> But, the last checkin for Hamake was June 2010. And it's still a cool
> >> little Hadoop/Pig scheduler.
> >> http://hamake.googlecode.com/
> >>
> >> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> >> <mi...@hotmail.com> wrote:
> >> > There has been some work to replace the use of queues with HBase.
> >> > This would be used to feed processes off the queue to help balance out
> >> > the load on the cluster.
> >> >
> >> > In one specific use case, this was effective because the time spent
> >> > processing each mapper.map() iteration is a couple of orders of
> magnitude as
> >> > the time it takes to pull the data from the 'queue' and to each node
> for
> >> > processing.
> >> >
> >> > Again, YMMV, it is an interesting hack though....
> >> >
> >> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
> >> > <ro...@gmail.com> wrote:
> >> >
> >> >> We have an application or a series of applications that listen to
> >> >> incoming feeds they then distribute this data in XML form to a
> number of
> >> >> queues.  Another set of processes listen to these queues and process
> the
> >> >> messages. Order of processing is important in so far as related
> messages
> >> >> need to be processed in sequence hence today all related messages go
> to the
> >> >> same queue and are processed by the same queue consumer.
> >> >>
> >> >> The idea would be replace the use of MQ with some kind of reliable
> >> >> distributed dispatch. Does Hadoop provide that?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Ted Dunning <td...@maprtech.com>.

There is another much more active fork of Azkaban.  See

https://github.com/rbpark/azkaban



On Sun, Aug 19, 2012 at 6:57 PM, Lance Norskog <go...@gmail.com> wrote:

> Cool. I'm on the sidelines of a project trying to use Oozie in a large
> Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.
>
> On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
> <ru...@gmail.com> wrote:
> > Glad to hear about Hamake. FWIW, I've had good success with Azkaban in
> the
> > past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even
> has a
> > DAG GUI.
> >
> >
> > On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >>
> >> Last checkin on Azkaban was 11 months ago:
> >>
> >>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
> >>
> >> But, the last checkin for Hamake was June 2010. And it's still a cool
> >> little Hadoop/Pig scheduler.
> >> http://hamake.googlecode.com/
> >>
> >> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> >> <mi...@hotmail.com> wrote:
> >> > There has been some work to replace the use of queues with HBase.
> >> > This would be used to feed processes off the queue to help balance out
> >> > the load on the cluster.
> >> >
> >> > In one specific use case, this was effective because the time spent
> >> > processing each mapper.map() iteration is a couple of orders of
> magnitude as
> >> > the time it takes to pull the data from the 'queue' and to each node
> for
> >> > processing.
> >> >
> >> > Again, YMMV, it is an interesting hack though....
> >> >
> >> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
> >> > <ro...@gmail.com> wrote:
> >> >
> >> >> We have an application or a series of applications that listen to
> >> >> incoming feeds they then distribute this data in XML form to a
> number of
> >> >> queues.  Another set of processes listen to these queues and process
> the
> >> >> messages. Order of processing is important in so far as related
> messages
> >> >> need to be processed in sequence hence today all related messages go
> to the
> >> >> same queue and are processed by the same queue consumer.
> >> >>
> >> >> The idea would be replace the use of MQ with some kind of reliable
> >> >> distributed dispatch. Does Hadoop provide that?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Ted Dunning <td...@maprtech.com>.

There is another much more active fork of Azkaban.  See

https://github.com/rbpark/azkaban



On Sun, Aug 19, 2012 at 6:57 PM, Lance Norskog <go...@gmail.com> wrote:

> Cool. I'm on the sidelines of a project trying to use Oozie in a large
> Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.
>
> On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
> <ru...@gmail.com> wrote:
> > Glad to hear about Hamake. FWIW, I've had good success with Azkaban in
> the
> > past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even
> has a
> > DAG GUI.
> >
> >
> > On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >>
> >> Last checkin on Azkaban was 11 months ago:
> >>
> >>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
> >>
> >> But, the last checkin for Hamake was June 2010. And it's still a cool
> >> little Hadoop/Pig scheduler.
> >> http://hamake.googlecode.com/
> >>
> >> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> >> <mi...@hotmail.com> wrote:
> >> > There has been some work to replace the use of queues with HBase.
> >> > This would be used to feed processes off the queue to help balance out
> >> > the load on the cluster.
> >> >
> >> > In one specific use case, this was effective because the time spent
> >> > processing each mapper.map() iteration is a couple of orders of
> magnitude as
> >> > the time it takes to pull the data from the 'queue' and to each node
> for
> >> > processing.
> >> >
> >> > Again, YMMV, it is an interesting hack though....
> >> >
> >> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
> >> > <ro...@gmail.com> wrote:
> >> >
> >> >> We have an application or a series of applications that listen to
> >> >> incoming feeds they then distribute this data in XML form to a
> number of
> >> >> queues.  Another set of processes listen to these queues and process
> the
> >> >> messages. Order of processing is important in so far as related
> messages
> >> >> need to be processed in sequence hence today all related messages go
> to the
> >> >> same queue and are processed by the same queue consumer.
> >> >>
> >> >> The idea would be replace the use of MQ with some kind of reliable
> >> >> distributed dispatch. Does Hadoop provide that?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Cool. I'm on the sidelines of a project trying to use Oozie in a large
Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.

On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
<ru...@gmail.com> wrote:
> Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
> past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has a
> DAG GUI.
>
>
> On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>> Last checkin on Azkaban was 11 months ago:
>>
>> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>>
>> But, the last checkin for Hamake was June 2010. And it's still a cool
>> little Hadoop/Pig scheduler.
>> http://hamake.googlecode.com/
>>
>> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>> > There has been some work to replace the use of queues with HBase.
>> > This would be used to feed processes off the queue to help balance out
>> > the load on the cluster.
>> >
>> > In one specific use case, this was effective because the time spent
>> > processing each mapper.map() iteration is a couple of orders of magnitude as
>> > the time it takes to pull the data from the 'queue' and to each node for
>> > processing.
>> >
>> > Again, YMMV, it is an interesting hack though....
>> >
>> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
>> > <ro...@gmail.com> wrote:
>> >
>> >> We have an application or a series of applications that listen to
>> >> incoming feeds they then distribute this data in XML form to a number of
>> >> queues.  Another set of processes listen to these queues and process the
>> >> messages. Order of processing is important in so far as related messages
>> >> need to be processed in sequence hence today all related messages go to the
>> >> same queue and are processed by the same queue consumer.
>> >>
>> >> The idea would be replace the use of MQ with some kind of reliable
>> >> distributed dispatch. Does Hadoop provide that?
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Cool. I'm on the sidelines of a project trying to use Oozie in a large
Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.

On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
<ru...@gmail.com> wrote:
> Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
> past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has a
> DAG GUI.
>
>
> On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>> Last checkin on Azkaban was 11 months ago:
>>
>> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>>
>> But, the last checkin for Hamake was June 2010. And it's still a cool
>> little Hadoop/Pig scheduler.
>> http://hamake.googlecode.com/
>>
>> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>> > There has been some work to replace the use of queues with HBase.
>> > This would be used to feed processes off the queue to help balance out
>> > the load on the cluster.
>> >
>> > In one specific use case, this was effective because the time spent
>> > processing each mapper.map() iteration is a couple of orders of magnitude as
>> > the time it takes to pull the data from the 'queue' and to each node for
>> > processing.
>> >
>> > Again, YMMV, it is an interesting hack though....
>> >
>> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
>> > <ro...@gmail.com> wrote:
>> >
>> >> We have an application or a series of applications that listen to
>> >> incoming feeds they then distribute this data in XML form to a number of
>> >> queues.  Another set of processes listen to these queues and process the
>> >> messages. Order of processing is important in so far as related messages
>> >> need to be processed in sequence hence today all related messages go to the
>> >> same queue and are processed by the same queue consumer.
>> >>
>> >> The idea would be replace the use of MQ with some kind of reliable
>> >> distributed dispatch. Does Hadoop provide that?
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Cool. I'm on the sidelines of a project trying to use Oozie in a large
Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.

On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
<ru...@gmail.com> wrote:
> Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
> past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has a
> DAG GUI.
>
>
> On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>> Last checkin on Azkaban was 11 months ago:
>>
>> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>>
>> But, the last checkin for Hamake was June 2010. And it's still a cool
>> little Hadoop/Pig scheduler.
>> http://hamake.googlecode.com/
>>
>> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>> > There has been some work to replace the use of queues with HBase.
>> > This would be used to feed processes off the queue to help balance out
>> > the load on the cluster.
>> >
>> > In one specific use case, this was effective because the time spent
>> > processing each mapper.map() iteration is a couple of orders of magnitude as
>> > the time it takes to pull the data from the 'queue' and to each node for
>> > processing.
>> >
>> > Again, YMMV, it is an interesting hack though....
>> >
>> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
>> > <ro...@gmail.com> wrote:
>> >
>> >> We have an application or a series of applications that listen to
>> >> incoming feeds they then distribute this data in XML form to a number of
>> >> queues.  Another set of processes listen to these queues and process the
>> >> messages. Order of processing is important in so far as related messages
>> >> need to be processed in sequence hence today all related messages go to the
>> >> same queue and are processed by the same queue consumer.
>> >>
>> >> The idea would be replace the use of MQ with some kind of reliable
>> >> distributed dispatch. Does Hadoop provide that?
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Cool. I'm on the sidelines of a project trying to use Oozie in a large
Hadoop-ecology app. Oozie is the one thing marked 'to be replaced'.

On Sun, Aug 19, 2012 at 6:31 PM, Russell Jurney
<ru...@gmail.com> wrote:
> Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
> past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has a
> DAG GUI.
>
>
> On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>> Last checkin on Azkaban was 11 months ago:
>>
>> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>>
>> But, the last checkin for Hamake was June 2010. And it's still a cool
>> little Hadoop/Pig scheduler.
>> http://hamake.googlecode.com/
>>
>> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
>> <mi...@hotmail.com> wrote:
>> > There has been some work to replace the use of queues with HBase.
>> > This would be used to feed processes off the queue to help balance out
>> > the load on the cluster.
>> >
>> > In one specific use case, this was effective because the time spent
>> > processing each mapper.map() iteration is a couple of orders of magnitude as
>> > the time it takes to pull the data from the 'queue' and to each node for
>> > processing.
>> >
>> > Again, YMMV, it is an interesting hack though....
>> >
>> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson
>> > <ro...@gmail.com> wrote:
>> >
>> >> We have an application or a series of applications that listen to
>> >> incoming feeds they then distribute this data in XML form to a number of
>> >> queues.  Another set of processes listen to these queues and process the
>> >> messages. Order of processing is important in so far as related messages
>> >> need to be processed in sequence hence today all related messages go to the
>> >> same queue and are processed by the same queue consumer.
>> >>
>> >> The idea would be replace the use of MQ with some kind of reliable
>> >> distributed dispatch. Does Hadoop provide that?
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has
a DAG GUI.

On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:

> Last checkin on Azkaban was 11 months ago:
>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>
> But, the last checkin for Hamake was June 2010. And it's still a cool
> little Hadoop/Pig scheduler.
> http://hamake.googlecode.com/
>
> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> <mi...@hotmail.com> wrote:
> > There has been some work to replace the use of queues with HBase.
> > This would be used to feed processes off the queue to help balance out
> the load on the cluster.
> >
> > In one specific use case, this was effective because the time spent
> processing each mapper.map() iteration is a couple of orders of magnitude
> as the time it takes to pull the data from the 'queue' and to each node for
> processing.
> >
> > Again, YMMV, it is an interesting hack though....
> >
> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson <
> robert.nicholson@gmail.com> wrote:
> >
> >> We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >>
> >> The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >>
> >>
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has
a DAG GUI.

On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:

> Last checkin on Azkaban was 11 months ago:
>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>
> But, the last checkin for Hamake was June 2010. And it's still a cool
> little Hadoop/Pig scheduler.
> http://hamake.googlecode.com/
>
> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> <mi...@hotmail.com> wrote:
> > There has been some work to replace the use of queues with HBase.
> > This would be used to feed processes off the queue to help balance out
> the load on the cluster.
> >
> > In one specific use case, this was effective because the time spent
> processing each mapper.map() iteration is a couple of orders of magnitude
> as the time it takes to pull the data from the 'queue' and to each node for
> processing.
> >
> > Again, YMMV, it is an interesting hack though....
> >
> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson <
> robert.nicholson@gmail.com> wrote:
> >
> >> We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >>
> >> The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >>
> >>
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has
a DAG GUI.

On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:

> Last checkin on Azkaban was 11 months ago:
>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>
> But, the last checkin for Hamake was June 2010. And it's still a cool
> little Hadoop/Pig scheduler.
> http://hamake.googlecode.com/
>
> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> <mi...@hotmail.com> wrote:
> > There has been some work to replace the use of queues with HBase.
> > This would be used to feed processes off the queue to help balance out
> the load on the cluster.
> >
> > In one specific use case, this was effective because the time spent
> processing each mapper.map() iteration is a couple of orders of magnitude
> as the time it takes to pull the data from the 'queue' and to each node for
> processing.
> >
> > Again, YMMV, it is an interesting hack though....
> >
> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson <
> robert.nicholson@gmail.com> wrote:
> >
> >> We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >>
> >> The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >>
> >>
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

Glad to hear about Hamake. FWIW, I've had good success with Azkaban in the
past for very complex, lengthy Hadoop/Pig/Streaming pipelines. It even has
a DAG GUI.

On Sun, Aug 19, 2012 at 5:43 PM, Lance Norskog <go...@gmail.com> wrote:

> Last checkin on Azkaban was 11 months ago:
>
> https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a
>
> But, the last checkin for Hamake was June 2010. And it's still a cool
> little Hadoop/Pig scheduler.
> http://hamake.googlecode.com/
>
> On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
> <mi...@hotmail.com> wrote:
> > There has been some work to replace the use of queues with HBase.
> > This would be used to feed processes off the queue to help balance out
> the load on the cluster.
> >
> > In one specific use case, this was effective because the time spent
> processing each mapper.map() iteration is a couple of orders of magnitude
> as the time it takes to pull the data from the 'queue' and to each node for
> processing.
> >
> > Again, YMMV, it is an interesting hack though....
> >
> > On Aug 19, 2012, at 11:46 AM, Robert Nicholson <
> robert.nicholson@gmail.com> wrote:
> >
> >> We have an application or a series of applications that listen to
> incoming feeds they then distribute this data in XML form to a number of
> queues.  Another set of processes listen to these queues and process the
> messages. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >>
> >> The idea would be replace the use of MQ with some kind of reliable
> distributed dispatch. Does Hadoop provide that?
> >>
> >>
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Last checkin on Azkaban was 11 months ago:
https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a

But, the last checkin for Hamake was June 2010. And it's still a cool
little Hadoop/Pig scheduler.
http://hamake.googlecode.com/

On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
<mi...@hotmail.com> wrote:
> There has been some work to replace the use of queues with HBase.
> This would be used to feed processes off the queue to help balance out the load on the cluster.
>
> In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing.
>
> Again, YMMV, it is an interesting hack though....
>
> On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:
>
>> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>>
>> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>>
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Last checkin on Azkaban was 11 months ago:
https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a

But, the last checkin for Hamake was June 2010. And it's still a cool
little Hadoop/Pig scheduler.
http://hamake.googlecode.com/

On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
<mi...@hotmail.com> wrote:
> There has been some work to replace the use of queues with HBase.
> This would be used to feed processes off the queue to help balance out the load on the cluster.
>
> In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing.
>
> Again, YMMV, it is an interesting hack though....
>
> On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:
>
>> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>>
>> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>>
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Last checkin on Azkaban was 11 months ago:
https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a

But, the last checkin for Hamake was June 2010. And it's still a cool
little Hadoop/Pig scheduler.
http://hamake.googlecode.com/

On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
<mi...@hotmail.com> wrote:
> There has been some work to replace the use of queues with HBase.
> This would be used to feed processes off the queue to help balance out the load on the cluster.
>
> In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing.
>
> Again, YMMV, it is an interesting hack though....
>
> On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:
>
>> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>>
>> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>>
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Lance Norskog <go...@gmail.com>.

Last checkin on Azkaban was 11 months ago:
https://github.com/azkaban/azkaban/commit/b105570625bcb2002de1acf4012c8d0e4388470a

But, the last checkin for Hamake was June 2010. And it's still a cool
little Hadoop/Pig scheduler.
http://hamake.googlecode.com/

On Sun, Aug 19, 2012 at 2:49 PM, Michael Segel
<mi...@hotmail.com> wrote:
> There has been some work to replace the use of queues with HBase.
> This would be used to feed processes off the queue to help balance out the load on the cluster.
>
> In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing.
>
> Again, YMMV, it is an interesting hack though....
>
> On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:
>
>> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>>
>> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>>
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Michael Segel <mi...@hotmail.com>.

There has been some work to replace the use of queues with HBase. 
This would be used to feed processes off the queue to help balance out the load on the cluster. 

In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing. 

Again, YMMV, it is an interesting hack though....

On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
> 
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
> 
> 
> 
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

The model with Hadoop would be to aggregate and write your events to
The Hadoop Distributed FileSystem, and then process them with
scheduled batch jobs via Hadoop MapReduce. If your requirements can
include some latency - then Hadoop can work for you. Depending on your
processing, you can schedule jobs down to say... every hour, half hour
or fifteen minutes? I'm not aware or anyone scheduling jobs more
frequently than that, but they may be. Chime in if you are.

For getting events to HDFS, look at Flume, Kafka and Scribe. For
processing events, look at Pig, HIVE and Cascading. For scheduling
jobs look at Oozie and Azkaban.

Russell Jurney http://datasyndrome.com

On Aug 19, 2012, at 9:47 AM, Robert Nicholson
<ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>
>
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Michael Segel <mi...@hotmail.com>.

There has been some work to replace the use of queues with HBase. 
This would be used to feed processes off the queue to help balance out the load on the cluster. 

In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing. 

Again, YMMV, it is an interesting hack though....

On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
> 
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
> 
> 
> 
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Michael Segel <mi...@hotmail.com>.

There has been some work to replace the use of queues with HBase. 
This would be used to feed processes off the queue to help balance out the load on the cluster. 

In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing. 

Again, YMMV, it is an interesting hack though....

On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
> 
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
> 
> 
> 
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Michael Segel <mi...@hotmail.com>.

There has been some work to replace the use of queues with HBase. 
This would be used to feed processes off the queue to help balance out the load on the cluster. 

In one specific use case, this was effective because the time spent processing each mapper.map() iteration is a couple of orders of magnitude as the time it takes to pull the data from the 'queue' and to each node for processing. 

Again, YMMV, it is an interesting hack though....

On Aug 19, 2012, at 11:46 AM, Robert Nicholson <ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
> 
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
> 
> 
> 
>

Re: Can Hadoop replace the use of MQ b/w processes?

Posted by Russell Jurney <ru...@gmail.com>.

The model with Hadoop would be to aggregate and write your events to
The Hadoop Distributed FileSystem, and then process them with
scheduled batch jobs via Hadoop MapReduce. If your requirements can
include some latency - then Hadoop can work for you. Depending on your
processing, you can schedule jobs down to say... every hour, half hour
or fifteen minutes? I'm not aware or anyone scheduling jobs more
frequently than that, but they may be. Chime in if you are.

For getting events to HDFS, look at Flume, Kafka and Scribe. For
processing events, look at Pig, HIVE and Cascading. For scheduling
jobs look at Oozie and Azkaban.

Russell Jurney http://datasyndrome.com

On Aug 19, 2012, at 9:47 AM, Robert Nicholson
<ro...@gmail.com> wrote:

> We have an application or a series of applications that listen to incoming feeds they then distribute this data in XML form to a number of queues.  Another set of processes listen to these queues and process the messages. Order of processing is important in so far as related messages need to be processed in sequence hence today all related messages go to the same queue and are processed by the same queue consumer.
>
> The idea would be replace the use of MQ with some kind of reliable distributed dispatch. Does Hadoop provide that?
>
>
>