You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/26 15:39:13 UTC
[jira] [Commented] (METRON-119) Move the PCAP topology from HBase
[ https://issues.apache.org/jira/browse/METRON-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258103#comment-15258103 ]
ASF GitHub Bot commented on METRON-119:
---------------------------------------
GitHub user cestella opened a pull request:
https://github.com/apache/incubator-metron/pull/89
METRON-119 Move PCAP infrastructure from HBase
As it stands, the existing approach to handling PCAP data has some issues handling high volume packet capture data. With the advent of a DPDK plugin for capturing packet data, we are going to hit some limitations on the throughput of consumption if we continue to try to push packet data into HBase at line-speed.
Furthermore, storing PCAP data into HBase limits the range of filter queries that we can perform (i.e. only those expressible within the key). As of now, we require all fields to be present (source IP/port, destination IP/port and protocol), rather than allowing any wildcards.
To address these issues, we should create a higher performance topology which attaches the appropriate header to the raw packet and timestamp read from Kafka (as placed onto kafka by the packet capture sensor) and appends this packet to a sequence file in HDFS. The sequence file will be rolled based on number of packets or time (e.g. 1 hrs worth of packets in a given sequence file).
On the query side, we should adjust the middle tier service layer to start a MR job on the appropriate set of sequence files to filter out the appropriate packets. NOTE: the UI modifications to make this reasonable for the end-user will need to be done in a follow-on JIRA.
In order to test this PR, I would suggest doing the following as the "happy path":
1. Install the pycapa library & utility via instructions [here](https://github.com/apache/incubator-metron/tree/master/metron-sensors/pycapa)
2. (if using singlenode vagrant) Kill the enrichment and sensor topologies via `for i in bro enrichment yaf snort;do storm kill $i;done`
3. Start the pcap topology via `/usr/metron/0.1BETA/bin/start_pcap_topology.sh`
4. Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa --producer --topic pcap -i eth1 -k node1:6667`
5. Watch the topology in the [Storm UI](http://node1:8744/index.html) and kill the packet capture utility from before when the number of packets ingested is over 1k.
6. Ensure that at at least 2 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap`
7. Choose a file (denoted by $FILE) and dump a few of the contents using the `pcap_inspector` utility via `/usr/metron/0.1BETA/bin/pcap_inspector.sh -i $FILE -n 5`
8. Choose one of the lines and note the source ip/port and dest ip/port
9. Go to the kibana app at [http://node1:5000](http://node1:5000) on the singlenode vagrant (ymmv on ec2) and input that query in the kibana PCAP panel.
10. Wait patiently while the MR job completes and the results are sent back in the form of a valid PCAP payload suitable for opening in wireshark
11. Open in wireshark to ensure the payload is valid.
If the payload is not valid PCAP, then please look at the [job history](http://node1:19888/jobhistory) and note the reason for job failure if any.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron pcap_extraction_topology
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/89.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #89
----
commit fee3d0d327fccba464a62c75029cdc733a8d2a56
Author: cstella <ce...@gmail.com>
Date: 2016-02-26T14:41:37Z
Adding pcap infrastructure
commit d6da175e7b7072c585650d5604c7dd886177962c
Author: cstella <ce...@gmail.com>
Date: 2016-02-26T15:47:08Z
Updating kafka component to be more featureful.
commit 0a9dba8771939c28f65efd585fa501a4b0a4125b
Author: cstella <ce...@gmail.com>
Date: 2016-02-26T22:09:56Z
Updating topology and integration test.
commit 374b391c29621aac26fd9ef3ad54872f78a6e960
Author: cstella <ce...@gmail.com>
Date: 2016-03-01T15:00:58Z
Updating integration test.
commit 88d3d1572ca4b812c2438c8219434f2f19f1467d
Author: cstella <ce...@gmail.com>
Date: 2016-03-02T20:34:33Z
Fixed weird situation with HDFS, made the callback handle multiple partitions, added licenses
commit d485bfa7415fbc8a5c34c3cf327468ce37b07847
Author: cstella <ce...@gmail.com>
Date: 2016-03-03T01:18:50Z
Updating topology.
commit 8a9706bf012c8041e3a8ea08e8924073cde887f1
Author: cstella <ce...@gmail.com>
Date: 2016-03-03T02:22:11Z
Merging can be fun, but this one was not. Merging in master with some overlapping files from my feature branch that made their way into master via another feature.
commit d99cb74892ac2624d77895368874de76edd274d8
Author: cstella <ce...@gmail.com>
Date: 2016-03-14T15:35:55Z
Merging from master.
commit 3f8daa693decc815c4c0328be9dc6994ae8a4310
Author: cstella <ce...@gmail.com>
Date: 2016-03-14T17:59:56Z
Updating component runner and integration test.
commit 86771b087d4ef38f87333be5027c4935fa79173e
Author: cstella <ce...@gmail.com>
Date: 2016-03-16T21:00:38Z
Integrating a proper integration test and service layer call.
commit 3cd17f1823b92661426bf21ea618c50cbb1ae2bf
Author: cstella <ce...@gmail.com>
Date: 2016-03-17T19:36:31Z
Updating integration test.
commit 52fb7b28163267d4e321a5becc1c4a8e73eff3ea
Author: cstella <ce...@gmail.com>
Date: 2016-03-18T13:05:25Z
Updating integration test.
commit 6f1e24f96f3fa96319337fda6385babee4ed2abb
Author: cstella <ce...@gmail.com>
Date: 2016-03-18T15:06:10Z
Updating classpath issues.
commit ae8a5c1f55de5daa467bae7d32977175efc5b4bb
Author: cstella <ce...@gmail.com>
Date: 2016-04-05T19:18:01Z
Merged master into feature branch.
commit 542ee9e19b9ef2c371f95a8143cad307f6a44347
Author: cstella <ce...@gmail.com>
Date: 2016-04-07T13:35:21Z
merged master in.
commit 3705c4719b73613c1d8f559672bbbbb31b14ff02
Author: cstella <ce...@gmail.com>
Date: 2016-04-07T17:37:03Z
Reverting some very bad things that I did.
commit c7f837704f17510ed3881066fd9b50a3ed889f2b
Author: cstella <ce...@gmail.com>
Date: 2016-04-07T21:42:47Z
Fixing spout config and integration test
commit b25cdaad2cf59f6448fbca368f2c5b0103750735
Author: cstella <ce...@gmail.com>
Date: 2016-04-08T14:35:01Z
Making this work with pycappa as well.
commit 182c151901de23b6d98435762276cd2802e685ba
Author: cstella <ce...@gmail.com>
Date: 2016-04-08T15:09:36Z
Updating integration test to work with timestamp in the key as well as timestamp pulled from the data.
commit cc02302f8c4c55b380f3fbbf018ff21e74570819
Author: cstella <ce...@gmail.com>
Date: 2016-04-08T15:34:30Z
Moved around some stuff and realized I was not using unsigned comparisons.
commit e0d47a5aa94500b0954ae12449a270a5a2022830
Author: cstella <ce...@gmail.com>
Date: 2016-04-11T13:52:41Z
Headerizing in the converter.
commit 69f49959c470f1b73eb6d579661bcdc257c7010b
Author: cstella <ce...@gmail.com>
Date: 2016-04-11T13:56:42Z
Still have some weird serialization error, but will fix shortly.
commit f30595d151b823d23e1c8682343aafab6c45a30d
Author: cstella <ce...@gmail.com>
Date: 2016-04-11T20:10:14Z
Updating converters to implement serializable.
commit 09004e1f4566d4aad4ef349d6ddb013e1991c4b2
Author: cstella <ce...@gmail.com>
Date: 2016-04-19T12:45:10Z
Merge branch 'master' into pcap_extraction_topology
commit f52e57968b94591f0750659c3546403cd8d56e79
Author: cstella <ce...@gmail.com>
Date: 2016-04-19T21:01:34Z
Updating next gen pcap to include a notion of endianness that is configurable.
commit bce86caf5047d9fbb42995b90d6e1d1842ee3cb2
Author: cstella <ce...@gmail.com>
Date: 2016-04-19T21:16:22Z
Added licenses.
commit f8dc3460c6678ba5c0e83e0d7cb21dce854810bc
Author: cstella <ce...@gmail.com>
Date: 2016-04-19T21:30:41Z
updated licenses and added a global_shade_version because the one in Metron-Common was very old.
commit cb1288697de8da0cbfd6fc3b253ac3cbb40f698e
Author: cstella <ce...@gmail.com>
Date: 2016-04-20T12:54:56Z
Merge branch 'master' into pcap_extraction_topology
commit dfc3558496740d2429e755a7a23ca18943601e9f
Author: cstella <ce...@gmail.com>
Date: 2016-04-20T16:08:04Z
Moving stuff out of common.
commit f6e2567f21ef698531568593383ac732c7670a18
Author: cstella <ce...@gmail.com>
Date: 2016-04-20T20:17:39Z
We don't need to be configurable for the endianness..I can figure that out from the JVM.
----
> Move the PCAP topology from HBase
> ---------------------------------
>
> Key: METRON-119
> URL: https://issues.apache.org/jira/browse/METRON-119
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> As it stands, the existing approach to handling PCAP data has some issues handling high volume packet capture data. With the advent of a DPDK plugin for capturing packet data, we are going to hit some limitations on the throughput of consumption if we continue to try to push packet data into HBase at line-speed.
> Furthermore, storing PCAP data into HBase limits the range of filter queries that we can perform (i.e. only those expressible within the key). As of now, we require all fields to be present (source IP/port, destination IP/port and protocol), rather than allowing any wildcards.
> To address these issues, we should create a higher performance topology which attaches the appropriate header to the raw packet and timestamp read from Kafka (as placed onto kafka by the packet capture sensor) and appends this packet to a sequence file in HDFS. The sequence file will be rolled based on number of packets or time (e.g. 1 hrs worth of packets in a given sequence file).
> On the query side, we should adjust the middle tier service layer to start a MR job on the appropriate set of sequence files to filter out the appropriate packets. NOTE: the UI modifications to make this reasonable for the end-user will need to be done in a follow-on JIRA.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)