You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Simone Roselli <si...@plista.com> on 2016/01/06 17:33:55 UTC

Spooldir needs a Kafka topic defined in the agent.conf

Hi,

I'm having trouble configuring a spooldir source using the Kafka sink

In Flume-NG I can use the Kafka sink without specify a topic name in the agent.conf, since the event contains this topic name in the headers.

Things look different using the spooldir source. If you don't provide a topic name in agent.conf, it will only try a default one (default-flume-topic).

Is there a way to force spooldir source using the topic name in the headers?

ps: I'm using Spooldir with the AVRO deserialization; no other particular configuration. "fileHeader" is set as "true"


Many thanks


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com

Re: Spooldir needs a Kafka topic defined in the agent.conf

Posted by Simone Roselli <si...@plista.com>.
Hi,

I'll try to be as clear as I can:

I'm using

 * 2 source = thrift, *spooldir*
 * 1 sink = kafka (no topic names configured in the agent.conf)

How events are sent to the proper topic?

Thrift writes a topic name (x) into the event's headers, then events are directly sent to kafka/x

When a Thrift event for some reasons is not sent, it is collected in a directory, /backup. Periodically, we move those events from /backup to the /spooldir

Instead of sending events to kafka/x, the spooldir source tries to send events to a "default-flume-topic"


Question, why event's headers are not considered from the Spooldir and what is a good way to fix this?


Thanks in advance

Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com

----- Original Message -----
From: "Keane, Mike" <mk...@conversantmedia.com>
To: "user" <us...@flume.apache.org>
Sent: Thursday, January 7, 2016 3:03:54 PM
Subject: RE: Spooldir needs a Kafka topic defined in the agent.conf

I assume you want to the topic header based on the contents of the line of data read in from the Spooling Directory Source?  If so I think you want to configure a Regex Extract Interceptor, or implement your own interceptor to do this.  

http://flume.apache.org/FlumeUserGuide.html#regex-extractor-interceptor


________________________________________
From: Simone Roselli [simone.roselli@plista.com]
Sent: Thursday, January 07, 2016 6:52 AM
To: user
Subject: Re: Spooldir needs a Kafka topic defined in the agent.conf

Hi,

in your configuration you define the topic name in the agent.conf (spoolingAgent.sinks.kafka-sink-1.topic = data_in);

This is what I do not want.

I would like the spoolDir to retrieve the topic name from the event headers


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com

----- Original Message -----
From: "Keane, Mike" <mk...@conversantmedia.com>
To: "user" <us...@flume.apache.org>
Sent: Wednesday, January 6, 2016 6:30:41 PM
Subject: RE: Spooldir needs a Kafka topic defined in the agent.conf

I attempted to put together a little Flume+Kafka tutorial including using Camus to run map-reduce jobs pulling from Kafka and writing to HDFS.  My example uses a spoolDirSource, KafkaChannel & KafkaSink.  This may be of some help to you.

https://github.com/mbkeane/BigDataTechCon/blob/master/README.md



________________________________________
From: Simone Roselli [simone.roselli@plista.com]
Sent: Wednesday, January 06, 2016 10:33 AM
To: user@flume.apache.org
Subject: Spooldir needs a Kafka topic defined in the agent.conf

Hi,

I'm having trouble configuring a spooldir source using the Kafka sink

In Flume-NG I can use the Kafka sink without specify a topic name in the agent.conf, since the event contains this topic name in the headers.

Things look different using the spooldir source. If you don't provide a topic name in agent.conf, it will only try a default one (default-flume-topic).

Is there a way to force spooldir source using the topic name in the headers?

ps: I'm using Spooldir with the AVRO deserialization; no other particular configuration. "fileHeader" is set as "true"


Many thanks


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.

RE: Spooldir needs a Kafka topic defined in the agent.conf

Posted by "Keane, Mike" <mk...@conversantmedia.com>.
I assume you want to the topic header based on the contents of the line of data read in from the Spooling Directory Source?  If so I think you want to configure a Regex Extract Interceptor, or implement your own interceptor to do this.  

http://flume.apache.org/FlumeUserGuide.html#regex-extractor-interceptor


________________________________________
From: Simone Roselli [simone.roselli@plista.com]
Sent: Thursday, January 07, 2016 6:52 AM
To: user
Subject: Re: Spooldir needs a Kafka topic defined in the agent.conf

Hi,

in your configuration you define the topic name in the agent.conf (spoolingAgent.sinks.kafka-sink-1.topic = data_in);

This is what I do not want.

I would like the spoolDir to retrieve the topic name from the event headers


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com

----- Original Message -----
From: "Keane, Mike" <mk...@conversantmedia.com>
To: "user" <us...@flume.apache.org>
Sent: Wednesday, January 6, 2016 6:30:41 PM
Subject: RE: Spooldir needs a Kafka topic defined in the agent.conf

I attempted to put together a little Flume+Kafka tutorial including using Camus to run map-reduce jobs pulling from Kafka and writing to HDFS.  My example uses a spoolDirSource, KafkaChannel & KafkaSink.  This may be of some help to you.

https://github.com/mbkeane/BigDataTechCon/blob/master/README.md



________________________________________
From: Simone Roselli [simone.roselli@plista.com]
Sent: Wednesday, January 06, 2016 10:33 AM
To: user@flume.apache.org
Subject: Spooldir needs a Kafka topic defined in the agent.conf

Hi,

I'm having trouble configuring a spooldir source using the Kafka sink

In Flume-NG I can use the Kafka sink without specify a topic name in the agent.conf, since the event contains this topic name in the headers.

Things look different using the spooldir source. If you don't provide a topic name in agent.conf, it will only try a default one (default-flume-topic).

Is there a way to force spooldir source using the topic name in the headers?

ps: I'm using Spooldir with the AVRO deserialization; no other particular configuration. "fileHeader" is set as "true"


Many thanks


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.


Re: Spooldir needs a Kafka topic defined in the agent.conf

Posted by Simone Roselli <si...@plista.com>.
Hi,

in your configuration you define the topic name in the agent.conf (spoolingAgent.sinks.kafka-sink-1.topic = data_in);

This is what I do not want.

I would like the spoolDir to retrieve the topic name from the event headers


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com

----- Original Message -----
From: "Keane, Mike" <mk...@conversantmedia.com>
To: "user" <us...@flume.apache.org>
Sent: Wednesday, January 6, 2016 6:30:41 PM
Subject: RE: Spooldir needs a Kafka topic defined in the agent.conf

I attempted to put together a little Flume+Kafka tutorial including using Camus to run map-reduce jobs pulling from Kafka and writing to HDFS.  My example uses a spoolDirSource, KafkaChannel & KafkaSink.  This may be of some help to you. 

https://github.com/mbkeane/BigDataTechCon/blob/master/README.md



________________________________________
From: Simone Roselli [simone.roselli@plista.com]
Sent: Wednesday, January 06, 2016 10:33 AM
To: user@flume.apache.org
Subject: Spooldir needs a Kafka topic defined in the agent.conf

Hi,

I'm having trouble configuring a spooldir source using the Kafka sink

In Flume-NG I can use the Kafka sink without specify a topic name in the agent.conf, since the event contains this topic name in the headers.

Things look different using the spooldir source. If you don't provide a topic name in agent.conf, it will only try a default one (default-flume-topic).

Is there a way to force spooldir source using the topic name in the headers?

ps: I'm using Spooldir with the AVRO deserialization; no other particular configuration. "fileHeader" is set as "true"


Many thanks


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.

RE: Spooldir needs a Kafka topic defined in the agent.conf

Posted by "Keane, Mike" <mk...@conversantmedia.com>.
I attempted to put together a little Flume+Kafka tutorial including using Camus to run map-reduce jobs pulling from Kafka and writing to HDFS.  My example uses a spoolDirSource, KafkaChannel & KafkaSink.  This may be of some help to you. 

https://github.com/mbkeane/BigDataTechCon/blob/master/README.md



________________________________________
From: Simone Roselli [simone.roselli@plista.com]
Sent: Wednesday, January 06, 2016 10:33 AM
To: user@flume.apache.org
Subject: Spooldir needs a Kafka topic defined in the agent.conf

Hi,

I'm having trouble configuring a spooldir source using the Kafka sink

In Flume-NG I can use the Kafka sink without specify a topic name in the agent.conf, since the event contains this topic name in the headers.

Things look different using the spooldir source. If you don't provide a topic name in agent.conf, it will only try a default one (default-flume-topic).

Is there a way to force spooldir source using the topic name in the headers?

ps: I'm using Spooldir with the AVRO deserialization; no other particular configuration. "fileHeader" is set as "true"


Many thanks


Simone Roselli
ITE Sysadmin
simone.roselli@plista.com
http://www.plista.com




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.