You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Juan Gentile <ju...@globant.com> on 2012/10/09 20:03:38 UTC

Flume-ng - Distributed

Hi,

I'm new to Flume-ng, I'd like to ask you if you can tell me how I can
accomplish to have an agent distributed in a cluster. I've have developed
my own source and sink version that reads from a queue and the sink stores
the messages read to hdfs. If I want to have this running in multiple
instances, do I have to submit it on each node?

This is my conf file:
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 1000

agent1.sources.source1.channels = channel1
agent1.sources.source1.type = MySource

agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.type = MySink

agent1.channels = channel1
agent1.sources = source1
agent1.sinks = sink1


I see that there is the concept of 'master' a 'node' in the previous
version of flume, do I have something similar here?

Thanks,
Juan

Re: Flume-ng - Distributed

Posted by Juan Gentile <ju...@globant.com>.

Great thanks this clears everything thanks!

On Wed, Oct 10, 2012 at 4:45 PM, Harish Mandala <mv...@gmail.com>wrote:

> You would use something like puppet or chef to push such config files
> around.
>
> Regards,
> Harish
>
>
> On Wed, Oct 10, 2012 at 2:19 PM, Camp, Roy <rc...@ebay.com> wrote:
>
>>  You have to manually start each node with its specific configuration.
>> However, you can put the configuration for all your different setups into
>> one config file, but you will still need to place a copy of it on every
>> machine.  Simply define which agent config to use with the --name param
>> when starting.  ****
>>
>> ** **
>>
>> Thanks,****
>>
>> ** **
>>
>> Roy****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
>> *Sent:* Wednesday, October 10, 2012 9:54 AM
>> *To:* user@flume.apache.org
>> *Subject:* Re: Flume-ng - Distributed****
>>
>> ** **
>>
>> Thank you both very much, I've been reading the documentation you sent me
>> and this brings another question, is there a way to submit my flume
>> configuration to a cluster or I have to manually start up each node with
>> the its specific configuration?****
>>
>> ** **
>>
>> Thank you!****
>>
>> On Wed, Oct 10, 2012 at 1:51 AM, Mike Percy <mp...@apache.org> wrote:***
>> *
>>
>> +1 on what Roy said, with a minor terminology quibble: in Flume NG the
>> Avro collector component is called the Avro Source.****
>>
>> ** **
>>
>> Also, here are links to the docs with working image links and table of
>> contents:****
>>
>> ** **
>>
>> http://flume.apache.org/FlumeUserGuide.html****
>>
>> http://flume.apache.org/FlumeDeveloperGuide.html****
>>
>> ** **
>>
>> Regards,****
>>
>> Mike****
>>
>> ** **
>>
>> On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rc...@ebay.com> wrote:****
>>
>> You would run a flume-ng instance on each node with an avro-sink.  Then
>> on your collector machine you will run another flume-ng instance with an
>> avro-collector.****
>>
>>  ****
>>
>> If you run more than one collector you can setup sink groups and define
>> that it does failover or load balancing.****
>>
>>  ****
>>
>> The concept of a flume master from flume 0.9.x does not exist on
>> flume-ng.  I personally use the node and collector configs in the same
>> config file under a different agent name, and then keep them synced on all
>> machines.  ****
>>
>>  ****
>>
>> These two docs are pretty helpful:****
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
>> ****
>>
>>  ****
>>
>> Thanks,****
>>
>>  ****
>>
>> Roy****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
>> *Sent:* Tuesday, October 09, 2012 11:04 AM
>> *To:* user@flume.apache.org
>> *Subject:* Flume-ng - Distributed****
>>
>>  ****
>>
>> Hi,****
>>
>>  ****
>>
>> I'm new to Flume-ng, I'd like to ask you if you can tell me how I can
>> accomplish to have an agent distributed in a cluster. I've have developed
>> my own source and sink version that reads from a queue and the sink stores
>> the messages read to hdfs. If I want to have this running in multiple
>> instances, do I have to submit it on each node?****
>>
>>  ****
>>
>> This is my conf file:****
>>
>> agent1.channels.channel1.type = memory****
>>
>> agent1.channels.channel1.capacity = 1000****
>>
>> agent1.channels.channel1.transactionCapacity = 1000****
>>
>>  ****
>>
>> agent1.sources.source1.channels = channel1****
>>
>> agent1.sources.source1.type = MySource****
>>
>>  ****
>>
>> agent1.sinks.sink1.channel = channel1****
>>
>> agent1.sinks.sink1.type = MySink****
>>
>>  ****
>>
>> agent1.channels = channel1****
>>
>> agent1.sources = source1****
>>
>> agent1.sinks = sink1****
>>
>>  ****
>>
>>  ****
>>
>> I see that there is the concept of 'master' a 'node' in the previous
>> version of flume, do I have something similar here?****
>>
>>  ****
>>
>> Thanks,****
>>
>> Juan****
>>
>> ** **
>>
>> ** **
>>
>
>

Re: Flume-ng - Distributed

Posted by Harish Mandala <mv...@gmail.com>.

You would use something like puppet or chef to push such config files
around.

Regards,
Harish

On Wed, Oct 10, 2012 at 2:19 PM, Camp, Roy <rc...@ebay.com> wrote:

>  You have to manually start each node with its specific configuration.
> However, you can put the configuration for all your different setups into
> one config file, but you will still need to place a copy of it on every
> machine.  Simply define which agent config to use with the --name param
> when starting.  ****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Roy****
>
> ** **
>
> ** **
>
> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
> *Sent:* Wednesday, October 10, 2012 9:54 AM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume-ng - Distributed****
>
> ** **
>
> Thank you both very much, I've been reading the documentation you sent me
> and this brings another question, is there a way to submit my flume
> configuration to a cluster or I have to manually start up each node with
> the its specific configuration?****
>
> ** **
>
> Thank you!****
>
> On Wed, Oct 10, 2012 at 1:51 AM, Mike Percy <mp...@apache.org> wrote:****
>
> +1 on what Roy said, with a minor terminology quibble: in Flume NG the
> Avro collector component is called the Avro Source.****
>
> ** **
>
> Also, here are links to the docs with working image links and table of
> contents:****
>
> ** **
>
> http://flume.apache.org/FlumeUserGuide.html****
>
> http://flume.apache.org/FlumeDeveloperGuide.html****
>
> ** **
>
> Regards,****
>
> Mike****
>
> ** **
>
> On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rc...@ebay.com> wrote:****
>
> You would run a flume-ng instance on each node with an avro-sink.  Then on
> your collector machine you will run another flume-ng instance with an
> avro-collector.****
>
>  ****
>
> If you run more than one collector you can setup sink groups and define
> that it does failover or load balancing.****
>
>  ****
>
> The concept of a flume master from flume 0.9.x does not exist on
> flume-ng.  I personally use the node and collector configs in the same
> config file under a different agent name, and then keep them synced on all
> machines.  ****
>
>  ****
>
> These two docs are pretty helpful:****
>
>
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
>
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
> ****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Roy****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
> *Sent:* Tuesday, October 09, 2012 11:04 AM
> *To:* user@flume.apache.org
> *Subject:* Flume-ng - Distributed****
>
>  ****
>
> Hi,****
>
>  ****
>
> I'm new to Flume-ng, I'd like to ask you if you can tell me how I can
> accomplish to have an agent distributed in a cluster. I've have developed
> my own source and sink version that reads from a queue and the sink stores
> the messages read to hdfs. If I want to have this running in multiple
> instances, do I have to submit it on each node?****
>
>  ****
>
> This is my conf file:****
>
> agent1.channels.channel1.type = memory****
>
> agent1.channels.channel1.capacity = 1000****
>
> agent1.channels.channel1.transactionCapacity = 1000****
>
>  ****
>
> agent1.sources.source1.channels = channel1****
>
> agent1.sources.source1.type = MySource****
>
>  ****
>
> agent1.sinks.sink1.channel = channel1****
>
> agent1.sinks.sink1.type = MySink****
>
>  ****
>
> agent1.channels = channel1****
>
> agent1.sources = source1****
>
> agent1.sinks = sink1****
>
>  ****
>
>  ****
>
> I see that there is the concept of 'master' a 'node' in the previous
> version of flume, do I have something similar here?****
>
>  ****
>
> Thanks,****
>
> Juan****
>
> ** **
>
> ** **
>

Re: Flume-ng - Distributed

Posted by Hari Shreedharan <hs...@cloudera.com>.

Most commonly this is done using something like puppet or chef. Like Roy said, you can use the same config file, but different agent names so you an deploy the same file, yet have different configurations for the agents on different machines.


Hari 

-- 
Hari Shreedharan


On Wednesday, October 10, 2012 at 11:19 AM, Camp, Roy wrote:

> You have to manually start each node with its specific configuration.  However, you can put the configuration for all your different setups into one config file, but you will still need to place a copy of it on every machine.  Simply define which agent config to use with the --name param when starting.  
>  
> Thanks,
>  
> Roy
>  
>  
> From: Juan Gentile [mailto:juan.gentile@globant.com] 
> Sent: Wednesday, October 10, 2012 9:54 AM
> To: user@flume.apache.org (mailto:user@flume.apache.org)
> Subject: Re: Flume-ng - Distributed 
>  
> Thank you both very much, I've been reading the documentation you sent me and this brings another question, is there a way to submit my flume configuration to a cluster or I have to manually start up each node with the its specific configuration?
>  
> 
> Thank you!
> On Wed, Oct 10, 2012 at 1:51 AM, Mike Percy <mpercy@apache.org (mailto:mpercy@apache.org)> wrote:
> +1 on what Roy said, with a minor terminology quibble: in Flume NG the Avro collector component is called the Avro Source.
>  
> 
> Also, here are links to the docs with working image links and table of contents:
> 
>  
> 
> http://flume.apache.org/FlumeUserGuide.html
> 
> http://flume.apache.org/FlumeDeveloperGuide.html
>  
> 
> Regards,
> 
> Mike
> 
>  
> On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rcamp@ebay.com (mailto:rcamp@ebay.com)> wrote:
> You would run a flume-ng instance on each node with an avro-sink.  Then on your collector machine you will run another flume-ng instance with an avro-collector.
>  
> If you run more than one collector you can setup sink groups and define that it does failover or load balancing.
>  
> The concept of a flume master from flume 0.9.x does not exist on flume-ng.  I personally use the node and collector configs in the same config file under a different agent name, and then keep them synced on all machines.  
>  
> These two docs are pretty helpful:
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst 
>  
> Thanks,
>  
> Roy
>  
>  
>  
>  
> From: Juan Gentile [mailto:juan.gentile@globant.com] 
> Sent: Tuesday, October 09, 2012 11:04 AM
> To: user@flume.apache.org (mailto:user@flume.apache.org)
> Subject: Flume-ng - Distributed 
>  
> Hi,
>  
> 
> I'm new to Flume-ng, I'd like to ask you if you can tell me how I can accomplish to have an agent distributed in a cluster. I've have developed my own source and sink version that reads from a queue and the sink stores the messages read to hdfs. If I want to have this running in multiple instances, do I have to submit it on each node?
> 
>  
> 
> This is my conf file:
> 
> agent1.channels.channel1.type = memory
> 
> agent1.channels.channel1.capacity = 1000
> 
> agent1.channels.channel1.transactionCapacity = 1000
> 
>  
> 
> agent1.sources.source1.channels = channel1
> 
> agent1.sources.source1.type = MySource
> 
>  
> 
> agent1.sinks.sink1.channel = channel1
> 
> agent1.sinks.sink1.type = MySink
> 
>  
> 
> agent1.channels = channel1
> 
> agent1.sources = source1
> 
> agent1.sinks = sink1
> 
> 
>  
> 
>  
> 
> I see that there is the concept of 'master' a 'node' in the previous version of flume, do I have something similar here?
> 
>  
> 
> Thanks,
> 
> Juan
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
> 
> 
>  
> 
> 
> 
>

RE: Flume-ng - Distributed

Posted by "Camp, Roy" <rc...@ebay.com>.

You have to manually start each node with its specific configuration.  However, you can put the configuration for all your different setups into one config file, but you will still need to place a copy of it on every machine.  Simply define which agent config to use with the --name param when starting.

Thanks,

Roy

From: Juan Gentile [mailto:juan.gentile@globant.com]
Sent: Wednesday, October 10, 2012 9:54 AM
To: user@flume.apache.org
Subject: Re: Flume-ng - Distributed

Thank you both very much, I've been reading the documentation you sent me and this brings another question, is there a way to submit my flume configuration to a cluster or I have to manually start up each node with the its specific configuration?

Thank you!
On Wed, Oct 10, 2012 at 1:51 AM, Mike Percy <mp...@apache.org>> wrote:
+1 on what Roy said, with a minor terminology quibble: in Flume NG the Avro collector component is called the Avro Source.

Also, here are links to the docs with working image links and table of contents:

http://flume.apache.org/FlumeUserGuide.html
http://flume.apache.org/FlumeDeveloperGuide.html

Regards,
Mike

On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rc...@ebay.com>> wrote:
You would run a flume-ng instance on each node with an avro-sink.  Then on your collector machine you will run another flume-ng instance with an avro-collector.

If you run more than one collector you can setup sink groups and define that it does failover or load balancing.

The concept of a flume master from flume 0.9.x does not exist on flume-ng.  I personally use the node and collector configs in the same config file under a different agent name, and then keep them synced on all machines.

These two docs are pretty helpful:
https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst

Thanks,

Roy

From: Juan Gentile [mailto:juan.gentile@globant.com<ma...@globant.com>]
Sent: Tuesday, October 09, 2012 11:04 AM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Flume-ng - Distributed

Hi,

I'm new to Flume-ng, I'd like to ask you if you can tell me how I can accomplish to have an agent distributed in a cluster. I've have developed my own source and sink version that reads from a queue and the sink stores the messages read to hdfs. If I want to have this running in multiple instances, do I have to submit it on each node?

This is my conf file:
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 1000

agent1.sources.source1.channels = channel1
agent1.sources.source1.type = MySource

agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.type = MySink

agent1.channels = channel1
agent1.sources = source1
agent1.sinks = sink1

I see that there is the concept of 'master' a 'node' in the previous version of flume, do I have something similar here?

Thanks,
Juan

Re: Flume-ng - Distributed

Posted by Juan Gentile <ju...@globant.com>.

Thank you both very much, I've been reading the documentation you sent me
and this brings another question, is there a way to submit my flume
configuration to a cluster or I have to manually start up each node with
the its specific configuration?

Thank you!

On Wed, Oct 10, 2012 at 1:51 AM, Mike Percy <mp...@apache.org> wrote:

> +1 on what Roy said, with a minor terminology quibble: in Flume NG the
> Avro collector component is called the Avro Source.
>
> Also, here are links to the docs with working image links and table of
> contents:
>
> http://flume.apache.org/FlumeUserGuide.html
> http://flume.apache.org/FlumeDeveloperGuide.html
>
> Regards,
> Mike
>
>
> On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rc...@ebay.com> wrote:
>
>>  You would run a flume-ng instance on each node with an avro-sink.  Then
>> on your collector machine you will run another flume-ng instance with an
>> avro-collector.****
>>
>> ** **
>>
>> If you run more than one collector you can setup sink groups and define
>> that it does failover or load balancing.****
>>
>> ** **
>>
>> The concept of a flume master from flume 0.9.x does not exist on
>> flume-ng.  I personally use the node and collector configs in the same
>> config file under a different agent name, and then keep them synced on all
>> machines.  ****
>>
>> ** **
>>
>> These two docs are pretty helpful:****
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
>> ****
>>
>> ** **
>>
>> Thanks,****
>>
>> ** **
>>
>> Roy****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
>> *Sent:* Tuesday, October 09, 2012 11:04 AM
>> *To:* user@flume.apache.org
>> *Subject:* Flume-ng - Distributed****
>>
>> ** **
>>
>> Hi,****
>>
>> ** **
>>
>> I'm new to Flume-ng, I'd like to ask you if you can tell me how I can
>> accomplish to have an agent distributed in a cluster. I've have developed
>> my own source and sink version that reads from a queue and the sink stores
>> the messages read to hdfs. If I want to have this running in multiple
>> instances, do I have to submit it on each node?****
>>
>> ** **
>>
>> This is my conf file:****
>>
>> agent1.channels.channel1.type = memory****
>>
>> agent1.channels.channel1.capacity = 1000****
>>
>> agent1.channels.channel1.transactionCapacity = 1000****
>>
>> ** **
>>
>> agent1.sources.source1.channels = channel1****
>>
>> agent1.sources.source1.type = MySource****
>>
>> ** **
>>
>> agent1.sinks.sink1.channel = channel1****
>>
>> agent1.sinks.sink1.type = MySink****
>>
>> ** **
>>
>> agent1.channels = channel1****
>>
>> agent1.sources = source1****
>>
>> agent1.sinks = sink1****
>>
>> ** **
>>
>> ** **
>>
>> I see that there is the concept of 'master' a 'node' in the previous
>> version of flume, do I have something similar here?****
>>
>> ** **
>>
>> Thanks,****
>>
>> Juan****
>>
>
>

Re: Flume-ng - Distributed

Posted by Mike Percy <mp...@apache.org>.

+1 on what Roy said, with a minor terminology quibble: in Flume NG the Avro
collector component is called the Avro Source.

Also, here are links to the docs with working image links and table of
contents:

http://flume.apache.org/FlumeUserGuide.html
http://flume.apache.org/FlumeDeveloperGuide.html

Regards,
Mike


On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rc...@ebay.com> wrote:

>  You would run a flume-ng instance on each node with an avro-sink.  Then
> on your collector machine you will run another flume-ng instance with an
> avro-collector.****
>
> ** **
>
> If you run more than one collector you can setup sink groups and define
> that it does failover or load balancing.****
>
> ** **
>
> The concept of a flume master from flume 0.9.x does not exist on
> flume-ng.  I personally use the node and collector configs in the same
> config file under a different agent name, and then keep them synced on all
> machines.  ****
>
> ** **
>
> These two docs are pretty helpful:****
>
>
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
>
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
> ****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Roy****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
> *Sent:* Tuesday, October 09, 2012 11:04 AM
> *To:* user@flume.apache.org
> *Subject:* Flume-ng - Distributed****
>
> ** **
>
> Hi,****
>
> ** **
>
> I'm new to Flume-ng, I'd like to ask you if you can tell me how I can
> accomplish to have an agent distributed in a cluster. I've have developed
> my own source and sink version that reads from a queue and the sink stores
> the messages read to hdfs. If I want to have this running in multiple
> instances, do I have to submit it on each node?****
>
> ** **
>
> This is my conf file:****
>
> agent1.channels.channel1.type = memory****
>
> agent1.channels.channel1.capacity = 1000****
>
> agent1.channels.channel1.transactionCapacity = 1000****
>
> ** **
>
> agent1.sources.source1.channels = channel1****
>
> agent1.sources.source1.type = MySource****
>
> ** **
>
> agent1.sinks.sink1.channel = channel1****
>
> agent1.sinks.sink1.type = MySink****
>
> ** **
>
> agent1.channels = channel1****
>
> agent1.sources = source1****
>
> agent1.sinks = sink1****
>
> ** **
>
> ** **
>
> I see that there is the concept of 'master' a 'node' in the previous
> version of flume, do I have something similar here?****
>
> ** **
>
> Thanks,****
>
> Juan****
>

Re: Flume-ng - Distributed

Posted by iain wright <ia...@gmail.com>.

I don't mean to hijack the thread, but is this tiered approach recommended
over reading from a local queue and having 10 or so nodes write directly to
hbase when using the async hbase sink?

-- 
Iain Wright

<http://www.labctsi.org/>
This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.


On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rc...@ebay.com> wrote:

>  You would run a flume-ng instance on each node with an avro-sink.  Then
> on your collector machine you will run another flume-ng instance with an
> avro-collector.****
>
> ** **
>
> If you run more than one collector you can setup sink groups and define
> that it does failover or load balancing.****
>
> ** **
>
> The concept of a flume master from flume 0.9.x does not exist on
> flume-ng.  I personally use the node and collector configs in the same
> config file under a different agent name, and then keep them synced on all
> machines.  ****
>
> ** **
>
> These two docs are pretty helpful:****
>
>
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
>
> https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
> ****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Roy****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Juan Gentile [mailto:juan.gentile@globant.com]
> *Sent:* Tuesday, October 09, 2012 11:04 AM
> *To:* user@flume.apache.org
> *Subject:* Flume-ng - Distributed****
>
> ** **
>
> Hi,****
>
> ** **
>
> I'm new to Flume-ng, I'd like to ask you if you can tell me how I can
> accomplish to have an agent distributed in a cluster. I've have developed
> my own source and sink version that reads from a queue and the sink stores
> the messages read to hdfs. If I want to have this running in multiple
> instances, do I have to submit it on each node?****
>
> ** **
>
> This is my conf file:****
>
> agent1.channels.channel1.type = memory****
>
> agent1.channels.channel1.capacity = 1000****
>
> agent1.channels.channel1.transactionCapacity = 1000****
>
> ** **
>
> agent1.sources.source1.channels = channel1****
>
> agent1.sources.source1.type = MySource****
>
> ** **
>
> agent1.sinks.sink1.channel = channel1****
>
> agent1.sinks.sink1.type = MySink****
>
> ** **
>
> agent1.channels = channel1****
>
> agent1.sources = source1****
>
> agent1.sinks = sink1****
>
> ** **
>
> ** **
>
> I see that there is the concept of 'master' a 'node' in the previous
> version of flume, do I have something similar here?****
>
> ** **
>
> Thanks,****
>
> Juan****
>

RE: Flume-ng - Distributed

Posted by "Camp, Roy" <rc...@ebay.com>.

You would run a flume-ng instance on each node with an avro-sink.  Then on your collector machine you will run another flume-ng instance with an avro-collector.

If you run more than one collector you can setup sink groups and define that it does failover or load balancing.

The concept of a flume master from flume 0.9.x does not exist on flume-ng.  I personally use the node and collector configs in the same config file under a different agent name, and then keep them synced on all machines.

These two docs are pretty helpful:
https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst
https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst

Thanks,

Roy




From: Juan Gentile [mailto:juan.gentile@globant.com]
Sent: Tuesday, October 09, 2012 11:04 AM
To: user@flume.apache.org
Subject: Flume-ng - Distributed

Hi,

I'm new to Flume-ng, I'd like to ask you if you can tell me how I can accomplish to have an agent distributed in a cluster. I've have developed my own source and sink version that reads from a queue and the sink stores the messages read to hdfs. If I want to have this running in multiple instances, do I have to submit it on each node?

This is my conf file:
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 1000

agent1.sources.source1.channels = channel1
agent1.sources.source1.type = MySource

agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.type = MySink

agent1.channels = channel1
agent1.sources = source1
agent1.sinks = sink1


I see that there is the concept of 'master' a 'node' in the previous version of flume, do I have something similar here?

Thanks,
Juan