You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by mmiklavc <gi...@git.apache.org> on 2017/12/05 18:26:59 UTC
[GitHub] metron pull request #785: METRON-1230: As a stopgap prior to METRON-777, add...

Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/metron/pull/785#discussion_r155033012
  
    --- Diff: metron-platform/metron-parsers/3rdPartyParser.md ---
    @@ -0,0 +1,306 @@
    +# Custom Metron Parsers
    +
    +We have many stock parsers for normal operations.  Some of these are
    +networking and cybersecurity focused (e.g. the ASA Parser), some of
    +these are general purpose (e.g. the CSVParser), but inevitably users
    +will want to extend the system to process their own data formats.  To
    +enable this, this is a walkthrough of how to create and use a custom
    +parser within Metron.
    +
    +# Writing A Custom Parser
    +Before we can use a parser, we will need to create a custom parser.  The
    +parser is the workhorse of Metron ingest.  It provides the mapping
    +between the raw data coming in via the Kafka value and a `JSONObject`,
    +the internal data structure provided.
    +
    +## Implementation
    +
    +In order to do create a custom parser, we need to do one of the following:
    +* Write a class which conforms to the `org.apache.metron.parsers.interfaces.MessageParser<JSONObject>` and `java.util.Serializable` interfaces
    +  * Implement `init()`, `validate(JSONObject message)`, and `List<JSONObject> parse(byte[] rawMessage)`
    +* Write a class which extends `org.apache.metron.parsers.BasicParser`
    +  * Provides convenience implementations to `validate` which ensures `timestamp` and `original_string` fields exist.
    +
    +## Example
    +
    +In order to illustrate how this might be done, let's create a very
    +simple parser that takes a comma separated pair and creates a couple of
    +fields:
    +* `original_string` -- the raw data
    +* `timestamp` -- the current time
    +* `first` -- the first field of the comma separated pair
    +* `last` -- the last field of the comma separated pair
    +
    +For this demonstration, let's create a maven project to compile our
    +project.  We'll call it `extra_parsers`, so in your workspace, let's set
    +up the maven project:
    +* Create the maven infrastructure for `extra_parsers` via
    +```
    +mkdir -p extra_parsers/src/{main,test}/java
    +```
    +* Create a pom file indicating how we should build our parsers by
    +  editing `extra_parsers/pom.xml` with the following content:
    +```
    +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    +  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    +  <modelVersion>4.0.0</modelVersion>
    +  <groupId>com.3rdparty</groupId>
    +  <artifactId>extra-parsers</artifactId>
    +  <packaging>jar</packaging>
    +  <version>1.0-SNAPSHOT</version>
    +  <name>extra-parsers</name>
    +  <url>http://thirdpartysoftware.org</url>
    +  <properties>
    +    <!-- The java version to conform to.  Metron works all the way to 1.8 -->
    +    <java_version>1.8</java_version>
    +    <!-- The version of Metron that we'll be targetting. -->
    +    <metron_version>0.4.1</metron_version>
    +    <!-- To complete the simulation, we'll depend on a common dependency -->
    +    <guava_version>19.0</guava_version>
    +    <!-- We will shade our dependencies to create a single jar at the end -->
    +    <shade_version>2.4.3</shade_version>
    +  </properties>
    +  <dependencies>
    +    <!--
    +    We want to depend on Metron, but ensure that the scope is "provided"
    +    as we do not want to include it in our bundle.
    +    -->
    +    <dependency>
    +      <groupId>org.apache.metron</groupId>
    +      <artifactId>metron-parsers</artifactId>
    +      <version>${metron_version}</version>
    +      <scope>provided</scope>
    +    </dependency>
    +    <dependency>
    +      <groupId>com.google.guava</groupId>
    +      <artifactId>guava</artifactId>
    +      <version>${guava_version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>junit</groupId>
    +      <artifactId>junit</artifactId>
    +      <version>3.8.1</version>
    +      <scope>test</scope>
    +    </dependency>
    +  </dependencies>
    +  <build>
    +    <plugins>
    +     <!-- We will set up the shade plugin to create a single jar at the
    +           end of the build lifecycle.  We will exclude some things and
    +           relocate others to simulate a real situation.
    +           
    +           One thing to note is that it's a good practice to shade and
    +           relocate common libraries that may be dependencies in Metron.
    +           Your jar will be merged with the parsers jar, so the metron
    +           version will be included for all overlapping classes.
    +           So, shade and relocate to ensure that YOUR version of the library is used.
    +      -->
    +
    +      <plugin>
    +        <groupId>org.apache.maven.plugins</groupId>
    +        <artifactId>maven-shade-plugin</artifactId>
    +        <version>${shade_version}</version>
    +        <configuration>
    +          <createDependencyReducedPom>true</createDependencyReducedPom>
    +          <artifactSet>
    +            <excludes>
    +              <!-- Exclude slf4j for no reason other than to illustrate how to exclude dependencies.
    +                   The metron team has nothing against slf4j. :-)
    +               -->
    +              <exclude>*slf4j*</exclude>
    +            </excludes>
    +          </artifactSet>
    +        </configuration>
    +        <executions>
    +          <execution>
    +            <phase>package</phase>
    +            <goals>
    +              <goal>shade</goal>
    +            </goals>
    +            <configuration>
    +              <shadedArtifactAttached>true</shadedArtifactAttached>
    +              <shadedClassifierName>uber</shadedClassifierName>
    +              <filters>
    +                <filter>
    +                  <!-- Sometimes these get added and confuse the uber jar out of shade -->
    +                  <artifact>*:*</artifact>
    +                  <excludes>
    +                    <exclude>META-INF/*.SF</exclude>
    +                    <exclude>META-INF/*.DSA</exclude>
    +                    <exclude>META-INF/*.RSA</exclude>
    +                  </excludes>
    +                </filter>
    +              </filters>
    +              <relocations>
    +                <!-- Relocate guava as it's used in Metron and I really want 0.19 -->
    +                <relocation>
    +                  <pattern>com.google</pattern>
    +                  <shadedPattern>com.thirdparty.guava</shadedPattern>
    +                </relocation>
    +              </relocations>
    +              <artifactSet>
    +                <excludes>
    +                  <!-- We can also exclude by artifactId and groupId -->
    +                  <exclude>storm:storm-core:*</exclude>
    +                  <exclude>storm:storm-lib:*</exclude>
    +                  <exclude>org.slf4j.impl*</exclude>
    +                  <exclude>org.slf4j:slf4j-log4j*</exclude>
    +                </excludes>
    +              </artifactSet>
    +            </configuration>
    +          </execution>
    +        </executions>
    +      </plugin>
    +      <!--
    +      We want to make sure we compile using java 1.8.
    +      -->
    +      <plugin>
    +        <groupId>org.apache.maven.plugins</groupId>
    +        <artifactId>maven-compiler-plugin</artifactId>
    +        <version>3.5.1</version>
    +        <configuration>
    +          <forceJavacCompilerUse>true</forceJavacCompilerUse>
    +          <source>${java_version}</source>
    +          <compilerArgument>-Xlint:unchecked</compilerArgument>
    +          <target>${java_version}</target>
    +          <showWarnings>true</showWarnings>
    +        </configuration>
    +      </plugin>
    +    </plugins>
    +  </build>
    +</project>
    +```
    +* Now let's create our parser  `com.thirdparty.SimpleParser` by creating the file `extra-parsers/src/main/java/com/thirdparty/SimpleParser.java` with the following content:
    +```
    +package com.thirdparty;
    +
    +import com.google.common.base.Splitter;
    +import com.google.common.collect.ImmutableList;
    +import com.google.common.collect.Iterables;
    +import org.apache.metron.parsers.BasicParser;
    +import org.json.simple.JSONObject;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +public class SimpleParser extends BasicParser {
    +  @Override
    +  public void init() {
    +
    +  }
    +
    +  @Override
    +  public List<JSONObject> parse(byte[] bytes) {
    +    String input = new String(bytes);
    +    Iterable<String> it = Splitter.on(",").split(input);
    +    JSONObject ret = new JSONObject();
    +    ret.put("original_string", input);
    +    ret.put("timestamp", System.currentTimeMillis());
    +    ret.put("first", Iterables.getFirst(it, "missing"));
    +    ret.put("last", Iterables.getLast(it, "missing"));
    +    return ImmutableList.of(ret);
    +  }
    +
    +  @Override
    +  public void configure(Map<String, Object> map) {
    +
    +  }
    +}
    +```
    +* Compile the parser via `mvn clean package` in `extra_parsers`
    +
    +This will create a jar containing your parser and its dependencies (sans Metron dependencies) in `extra-parsers/target/extra-parsers-1.0-SNAPSHOT-uber.jar`
    +
    +# Deploying Your Custom Parser
    +
    +In order to deploy your newly built custom parser, you would place the jar file above in the `$METRON_HOME/parser_contrib` directory on the Metron host (i.e. any host you would start parsers from or, alternatively, where the Metron REST is hosted).
    +
    +## Example
    +
    +Let's work through deploying the example above.
    +
    +### Preliminaries
    +
    +We assume that the following environment variables are set:
    +* `METRON_HOME` - the home directory for metron
    +* `ZOOKEEPER` - The zookeeper quorum (comma separated with port specified: e.g. `node1:2181` for full-dev)
    +* `BROKERLIST` - The Kafka broker list (comma separated with port specified: e.g. `node1:6667` for full-dev)
    +* `ES_HOST` - The elasticsearch master (and port) e.g. `node1:9200` for full-dev.
    +
    +Also, this does not assume that you are using a kerberized cluster.  If you are, then the parser start command will adjust slightly to include the security protocol.
    +
    +### Copy the jar file up
    +
    +Copy the jar file located in `extra-parsers/target/extra-parsers-1.0-SNAPSHOT-uber.jar` to `$METRON_HOME/parser_contrib` and ensure the permissions are such that the `metron` user can read and execute.
    +
    +### Restart the REST service in Ambari
    +
    +In order for new parsers to be picked up, the REST service must be restarted.  You can do that from within Ambari by restarting the `Metron REST` service.
    +
    +### Create a Kafka Topic
    +
    +Create a kafka topic, let's call it `test` via:
    +`/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER --create --topic test --partitions 1 --replication-factor 1`
    +
    +Note, in a real deployment, that topic would be named something more descriptive and would have replication factor and partitions set to something less trivial.
    +
    +### Configure Test Parser
    +
    +Create the a file called `$METRON_HOME/config/zookeeper/parsers/test.json` with the following content:
    +```
    +{
    +  "parserClassName":"com.thirdparty.SimpleParser",
    +  "sensorTopic":"test"
    +}
    +```
    +
    +### Start Parser
    --- End diff --
    
    Need to push config to Zookeeper before starting the parser.
    
    ```
    Now push the config to Zookeeper with the following command:
    $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper/ -z $ZOOKEEPER
    
    ```


---