You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/11/10 23:30:40 UTC
[Hadoop Wiki] Update of "Sending_information_to_Chukwa" by AriRabkin
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Sending_information_to_Chukwa" page has been changed by AriRabkin.
The comment on this change is: move demux stuff to new page.
http://wiki.apache.org/hadoop/Sending_information_to_Chukwa?action=diff&rev1=3&rev2=4
--------------------------------------------------
* Where <fileName> is the local path on your machine
1. Close the socket
-
- == Extract information from this new dataSource ==
-
- === Using the default TimeStamp Parser ===
-
- By default, Chukwa will use the default TsProcessor.
-
- This parser will try to extract the real log statement from the log entry using the %d{ISO8601} date format.
- If it fails, it will use the time at which the chunk as been written to disk (collector timestamp).
-
- Your log will be automatically available from the Web Log viewer under the <YourRecordTypeHere> directory
-
- === Using a specific Parser ===
- If you want to extract some specific information and perform more processing you need to write your own parser.
- Like any M/R program, your have to write at least the Map side for your parser. The reduce side is Identity by default.
-
- ==== MAP side of the parser ====
- Your can write your own parser from scratch or extend the AbstractProcessor class that hides all the low level action on the chunk.
- Then you have to register your parser to the demux (link between the RecordType and the parser)
-
- ==== Parser registration ====
- * Edit ${CHUKWA_HOME}/conf/chukwa-demux-conf.xml and add the following lines
-
- <property>
- <name><YourRecordType_Here></name>
- <value><org.apache.hadoop.chukwa.extraction.demux.processor.mapper.MyParser></value>
- <description>Parser class for <YourRecordType_Here></description>
- </property>
-
- (Tips: You can use the same parser for different recordType)
-
- ==== Parser implementation ====
-
- {{{#!java
-
- public class MyParser extends AbstractProcessor
- {
- protected void parse(String recordEntry,
- OutputCollector<ChukwaRecordKey, ChukwaRecord> output,
- Reporter reporter)
- {
-
- // Extract Log4j information, i.e timestamp, logLevel, logger, ...
- SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm");
- // Extract log timestamp & Log4j information
- String dStr = recordEntry.substring(0, 23);
- int start = 24;
- int idx = recordEntry.indexOf(' ', start);
- String logLevel = recordEntry.substring(start, idx);
- start = idx + 1;
- idx = recordEntry.indexOf(' ', start);
- String className = recordEntry.substring(start, idx-1);
- String body = recordEntry.substring(idx + 1);
-
- Date d = sdf.parse(dStr);
- key = new ChukwaRecordKey();
- record = new ChukwaRecord();
-
- key = new ChukwaRecordKey();
- key.setKey("<YOUR_KEY_HERE>"));
- key.setReduceType("<YOUR_RECORD_TYPE_HERE>");
-
- record = new ChukwaRecord();
- record.setTime(d.getTime());
-
- // Parse your line here and extract useful information
- // Add your {key,value} pairs
- record.add(key1, value1);
- record.add(key2, value2);
- record.add(key3, value3);
-
- // Output your record
- output.collect(key, record);
- }
- }
-
-
- }}}
-
- (Tips: see org.apache.hadoop.chukwa.extraction.demux.processor.mapper.Df class, for an example of Parser class)
-
- ==== REDUCE side of the parser ====
- You only need to implement a reduce side if you need to group records together.
- Here the interface that your need to implement:
-
- The link between the Map side and the reduce is done by setting your reduce class into the reduce type: key.setReduceType("<YourReduceClassHere>");
-
- {{{#!java
- public interface ReduceProcessor
- {
- public String getDataType();
- public void process(ChukwaRecordKey key,Iterator<ChukwaRecord> values,
- OutputCollector<ChukwaRecordKey,
- ChukwaRecord> output, Reporter reporter);
- }
- }}}
-
- (Tips: see org.apache.hadoop.chukwa.extraction.demux.processor.reducer.SystemMetrics class, for an example of Reduce class)
-
- ==== Parser key field ====
-
- Your data is going to be sorted by RecordType then by the key field.
- The default implementation use the following grouping for all records:
- 1. Time partition (Time up to the hour)
- 1. Machine name (physical input source)
- 1. Record timestamp
-
- ==== Output directory ====
- The demux process will use the recordType to save similar records together (same recordType) to the same directory:
- <Your_Cluster_Information>/<Your_Record_Type>/
-