You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by mmiklavc <gi...@git.apache.org> on 2016/09/16 04:56:33 UTC

[GitHub] incubator-metron pull request #256: Metron 257 Enable pcap result pagination...

GitHub user mmiklavc opened a pull request:

    https://github.com/apache/incubator-metron/pull/256

    Metron 257 Enable pcap result pagination from the Pcap CLI

    This builds on efforts from [https://github.com/apache/incubator-metron/pull/217](https://github.com/apache/incubator-metron/pull/217)
    
    This closes [https://issues.apache.org/jira/browse/METRON-257](https://issues.apache.org/jira/browse/METRON-257)
    
    The purpose for this PR is to give the user the ability to specify how many records per file should be written by the PCAP CLI tool. For example, if 1,000 records are returned by a PCAP query and the user specifies 200 records per file, then the user should expect 5 PCAP files to be written to the current working directory.
    
    **Testing**
    Get PCAP data into Metron: Install and setup pycapa - the instructions below reference/mirror those in [PR-93](https://github.com/apache/incubator-metron/pull/93#issue-151705836)
    
    1. Install the pycapa library & utility ```$ cd /opt/pycapa/pycapa && pip install -r requirements.txt && python setup.py install```
    2. (if using singlenode vagrant) Kill the enrichment and sensor topologies via for i in bro enrichment yaf snort;do storm kill $i;done
    3. Start the pcap topology via /usr/metron/0.2.0BETA/bin/start_pcap_topology.sh
    4. Start the pycapa packet capture producer on eth1 via /usr/bin/pycapa --producer --topic pcap -i eth1 -k node1:6667
    5. Watch the topology in the Storm UI and kill the packet capture utility from before, when the number of packets ingested is over 3k.
    6. Ensure that at at least 3 files exist on HDFS by running hadoop fs -ls /apps/metron/pcap
    7. Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility via /usr/metron/0.2.0BETA/bin/pcap_inspector.sh -i $FILE -n 5
    8. Choose one of the lines and note the protocol.
    9. Note that when you run the commands below, the resulting file will be placed in the execution directory where you kicked off the job from.
    
    ### Fixed filter
    
    1. Run a fixed filter query by executing the following command with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch)
    2. `/usr/metron/0.2.0BETA/bin/pcap_query.sh fixed -st <start_time> -df "yyyyMMdd" -p <protocol_num> -rpf 500`
    3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
    4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500.
    
    ### Query filter
    
    1. Run a Stellar query filter query by executing a command similar to the following, with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch)
    2. `/usr/metron/0.2.0BETA/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query "protocol == '6'"  -rpf 500`
    3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
    4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500.
    
    **References:**
    - https://github.com/apache/incubator-metron/pull/217
    - https://github.com/apache/incubator-metron/pull/156
    - https://github.com/apache/incubator-metron/pull/93#issue-151705836
    - https://github.com/apache/incubator-metron/tree/master/metron-sensors/pycapa

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmiklavc/incubator-metron METRON-257

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #256
    
----
commit a06fd0b918bfd4d363d58ffb5c5c2dba015e7c57
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2016-09-15T01:54:50Z

    Refactor pcap cli tests. Fix num reducers

commit cc06349495315a6e1e7b393cd4871bf3e855bc3b
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2016-09-16T04:43:43Z

    METRON-257 Enable pcap result pagination from the Pcap CLI

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #256: Metron 257 Enable pcap result pagination from t...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/256
  
    FYI: That test nondeterminism should be fixed as of METRON-426 aka #257 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #256: Metron 257 Enable pcap result pagination from t...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/256
  
    CI failure seems unrelated.  Probably something we need to address, but a re-run will probably fix it for this PR.
    
    ```
    Failed tests: 
      StellarStatisticsFunctionsTest.testMergeProviders:215 Percentile mismatch for 
    60.0th %ile expected:<0.22611711437989881> but was:<0.23631231837333944>
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #256: Metron 257 Enable pcap result pagination...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/256#discussion_r79158991
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/hadoop/SequenceFileIterable.java ---
    @@ -0,0 +1,138 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.hadoop;
    +
    +import com.google.common.collect.Iterators;
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.io.BytesWritable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.SequenceFile;
    +import org.apache.log4j.Logger;
    +
    +import java.io.IOException;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.NoSuchElementException;
    +
    +import static java.lang.String.format;
    +
    +public class SequenceFileIterable implements Iterable<byte[]> {
    +  private static final Logger LOGGER = Logger.getLogger(SequenceFileIterable.class);
    +  private List<Path> files;
    +  private Configuration config;
    +
    +  public SequenceFileIterable(List<Path> files, Configuration config) {
    +    this.files = files;
    +    this.config = config;
    +  }
    +
    +  @Override
    +  public Iterator<byte[]> iterator() {
    +    return Iterators.concat(getIterators(files, config));
    +  }
    +
    +  private Iterator<byte[]>[] getIterators(List<Path> files, Configuration config) {
    +    return files.stream().map(f -> new SequenceFileIterator(f, config)).toArray(Iterator[]::new);
    +  }
    +
    +  /**
    +   * Cleans up all files read by this Iterable
    +   *
    +   * @return true if success, false if any files were not deleted
    +   * @throws IOException
    +   */
    +  public boolean cleanup() throws IOException {
    +    FileSystem fileSystem = FileSystem.get(config);
    +    boolean success = true;
    +    for (Path file : files) {
    +      success &= fileSystem.delete(file, false);
    +    }
    +    return success;
    +  }
    +
    +  private static class SequenceFileIterator implements Iterator<byte[]> {
    +    private Path path;
    +    private Configuration config;
    +    private SequenceFile.Reader reader;
    +    private LongWritable key = new LongWritable();
    +    private BytesWritable value = new BytesWritable();
    +    private byte[] next;
    +    private boolean finished = false;
    +
    +    public SequenceFileIterator(Path path, Configuration config) {
    +      this.path = path;
    +      this.config = config;
    +    }
    +
    +    @Override
    +    public boolean hasNext() {
    +      if (!finished && null == reader) {
    +        try {
    +          reader = new SequenceFile.Reader(config, SequenceFile.Reader.file(path));
    +          LOGGER.debug("Writing file: " + path.toString());
    +        } catch (IOException e) {
    +          throw new RuntimeException("Failed to get reader", e);
    +        }
    +      } else {
    +        LOGGER.debug(format("finished=%s, reader=%s, next=%s", finished, reader, next));
    +      }
    +      try {
    +        //ensure hasnext is idempotent
    +        if (!finished) {
    +          if (null == next && reader.next(key, value)) {
    +            next = value.copyBytes();
    +          } else if (null == next) {
    +            close();
    +          }
    +        }
    +      } catch (IOException e) {
    +        close();
    +        throw new RuntimeException("Failed to get next record", e);
    +      }
    +      return (null != next);
    +    }
    +
    +    private void close() {
    +      LOGGER.debug("Closing file: " + path.toString());
    +      finished = true;
    +      try {
    +        if (reader != null) {
    +          reader.close();
    +          reader = null;
    +        }
    +      } catch (IOException e) {
    +        // ah well, we tried...
    --- End diff --
    
    Can we log this Exception?  I'm not sure we can do anything about it, but it would be nice to be able to ensure we can see it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #256: Metron 257 Enable pcap result pagination...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/256


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #256: Metron 257 Enable pcap result pagination from t...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/256
  
    It's troubling because that RNG is seeded and the results should be deterministic.  I wonder if the t-digest merge has some non-determinism.  Anyway, it needs to be fixed, but definitely not here.
    
    Great job on the PR, @mmiklavc +1 pending CI build.  Just close and reopen the PR and everything SHOULD be kosher.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---