You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Sutanu Das <sd...@att.com> on 2015/08/21 22:44:34 UTC

Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Hi Team,

We are asked to create HDFS directory in HDFS Sink based on Logfile Pattern/Topic. Is it possible with Flume Interceptors / Extractors / Serializes out-of-box ?

Example: Single Logfile has following lines:

t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:d0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT" mode="RADIO_MODE_AP"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:c0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT_40" mode="RADIO_MODE_AP"


So Is it possible to write each lines from the single sample Log above to separate HDFS Sink Directory based on the Keywork/patter-topic ( eg Aruba Presence and ArubaRadio) ?  so it will looks like this during Flume HDFS sink write:


Creating /prod/hadoop/Aruba Presence/2015/08/21/20/Airwave_amp_2.1440189722272.tmp

Creating /prod/hadoop/ArubaRadio/2015/08/21/20/Airwave_amp_2.1440189722272.tmp


Re: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Posted by Roshan Naik <ro...@hortonworks.com>.
If you using the SpoolDir take a look at the header related config settings.
else..
Use an appropriate interceptor.
-roshan

From: Sutanu Das <sd...@att.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Friday, August 21, 2015 5:58 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: RE: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Thank you Roshan. How do I set up a key/value in the header? Using Regex Extractor Interceptor ?

I can't find any example on key/value setup in Flume doc/wiki/github, sorry.


From: Roshan Naik [mailto:roshan@hortonworks.com]
Sent: Friday, August 21, 2015 5:15 PM
To: user@flume.apache.org<ma...@flume.apache.org>
Subject: Re: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

You can setup a key/value in the header to indicate where the data is coming from. Eg. sourceInfo=ArubaRadio

In the HDFS sink's path you can specify the sourceInfo header...  E.g. /path/%{sourceInfo}/more .  Take a look at the escape sequences in the HDFS Sink doc.
-roshan

From: Sutanu Das <sd...@att.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Friday, August 21, 2015 1:44 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Hi Team,

We are asked to create HDFS directory in HDFS Sink based on Logfile Pattern/Topic. Is it possible with Flume Interceptors / Extractors / Serializes out-of-box ?

Example: Single Logfile has following lines:

t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:d0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT" mode="RADIO_MODE_AP"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:c0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT_40" mode="RADIO_MODE_AP"


So Is it possible to write each lines from the single sample Log above to separate HDFS Sink Directory based on the Keywork/patter-topic ( eg Aruba Presence and ArubaRadio) ?  so it will looks like this during Flume HDFS sink write:


Creating /prod/hadoop/Aruba Presence/2015/08/21/20/Airwave_amp_2.1440189722272.tmp

Creating /prod/hadoop/ArubaRadio/2015/08/21/20/Airwave_amp_2.1440189722272.tmp


RE: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Posted by Sutanu Das <sd...@att.com>.
Thank you Roshan. How do I set up a key/value in the header? Using Regex Extractor Interceptor ?

I can't find any example on key/value setup in Flume doc/wiki/github, sorry.


From: Roshan Naik [mailto:roshan@hortonworks.com]
Sent: Friday, August 21, 2015 5:15 PM
To: user@flume.apache.org
Subject: Re: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

You can setup a key/value in the header to indicate where the data is coming from. Eg. sourceInfo=ArubaRadio

In the HDFS sink's path you can specify the sourceInfo header...  E.g. /path/%{sourceInfo}/more .  Take a look at the escape sequences in the HDFS Sink doc.
-roshan

From: Sutanu Das <sd...@att.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Friday, August 21, 2015 1:44 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Hi Team,

We are asked to create HDFS directory in HDFS Sink based on Logfile Pattern/Topic. Is it possible with Flume Interceptors / Extractors / Serializes out-of-box ?

Example: Single Logfile has following lines:

t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:d0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT" mode="RADIO_MODE_AP"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:c0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT_40" mode="RADIO_MODE_AP"


So Is it possible to write each lines from the single sample Log above to separate HDFS Sink Directory based on the Keywork/patter-topic ( eg Aruba Presence and ArubaRadio) ?  so it will looks like this during Flume HDFS sink write:


Creating /prod/hadoop/Aruba Presence/2015/08/21/20/Airwave_amp_2.1440189722272.tmp

Creating /prod/hadoop/ArubaRadio/2015/08/21/20/Airwave_amp_2.1440189722272.tmp


Re: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Posted by Roshan Naik <ro...@hortonworks.com>.
You can setup a key/value in the header to indicate where the data is coming from. Eg. sourceInfo=ArubaRadio

In the HDFS sink's path you can specify the sourceInfo header...  E.g. /path/%{sourceInfo}/more .  Take a look at the escape sequences in the HDFS Sink doc.
-roshan

From: Sutanu Das <sd...@att.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Friday, August 21, 2015 1:44 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Creating HDFS sink directories based on LogFile Pattern - POSSIBLE with Flume?

Hi Team,

We are asked to create HDFS directory in HDFS Sink based on Logfile Pattern/Topic. Is it possible with Flume Interceptors / Extractors / Serializes out-of-box ?

Example: Single Logfile has following lines:

t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440187845 ArubaPresence op="add" sta_mac="" associated="False" ap_name="a036000000kqVoW-02i6000000T5jrU"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:d0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT" mode="RADIO_MODE_AP"
t=1440189388 ArubaRadio op="update" mac="04:bd:88:80:38:c0" ap_mac="04:bd:88:c0:03:8c" type="RADIO_PHY_TYPE_A_HT_40" mode="RADIO_MODE_AP"


So Is it possible to write each lines from the single sample Log above to separate HDFS Sink Directory based on the Keywork/patter-topic ( eg Aruba Presence and ArubaRadio) ?  so it will looks like this during Flume HDFS sink write:


Creating /prod/hadoop/Aruba Presence/2015/08/21/20/Airwave_amp_2.1440189722272.tmp

Creating /prod/hadoop/ArubaRadio/2015/08/21/20/Airwave_amp_2.1440189722272.tmp